Overview

Dataset statistics

Number of variables65
Number of observations4272
Missing cells174608
Missing cells (%)62.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.1 MiB
Average record size in memory520.0 B

Variable types

Numeric7
Categorical58

Alerts

repeat_instrument_1 has constant value ""Constant
repeat_instrument_2 has constant value ""Constant
repeat_instance_1 has constant value ""Constant
repeat_instance_2 has constant value ""Constant
local_de_recidiva_a_distancia_metastase_4_cid_o_topografia_2 has constant value ""Constant
data_da_primeira_consulta_institucional_dt_pci_1 has a high cardinality: 2463 distinct valuesHigh cardinality
data_da_primeira_consulta_institucional_dt_pci_2 has a high cardinality: 349 distinct valuesHigh cardinality
data_do_diagnostico_1 has a high cardinality: 2460 distinct valuesHigh cardinality
data_do_diagnostico_2 has a high cardinality: 358 distinct valuesHigh cardinality
codigo_da_topografia_cid_o_2 has a high cardinality: 72 distinct valuesHigh cardinality
data_do_tratamento_1 has a high cardinality: 2405 distinct valuesHigh cardinality
data_do_tratamento_2 has a high cardinality: 322 distinct valuesHigh cardinality
data_de_recidiva_1 has a high cardinality: 1021 distinct valuesHigh cardinality
descricao_da_morfologia_de_acordo_com_cid_o_2 has a high cardinality: 73 distinct valuesHigh cardinality
descricao_da_topografia_2 has a high cardinality: 72 distinct valuesHigh cardinality
classificacao_tnm_clinico_n_2 is highly imbalanced (54.4%)Imbalance
classificacao_tnm_clinico_m_1 is highly imbalanced (70.8%)Imbalance
classificacao_tnm_clinico_m_2 is highly imbalanced (65.0%)Imbalance
descricao_da_morfologia_de_acordo_com_cid_o_1 is highly imbalanced (82.6%)Imbalance
com_recidiva_a_distancia_2 is highly imbalanced (50.5%)Imbalance
com_recidiva_regional_1 is highly imbalanced (66.0%)Imbalance
com_recidiva_regional_2 is highly imbalanced (80.7%)Imbalance
com_recidiva_local_1 is highly imbalanced (60.7%)Imbalance
com_recidiva_local_2 is highly imbalanced (64.3%)Imbalance
repeat_instrument_2 has 3903 (91.4%) missing valuesMissing
repeat_instance_2 has 3903 (91.4%) missing valuesMissing
data_da_primeira_consulta_institucional_dt_pci_2 has 3903 (91.4%) missing valuesMissing
data_do_diagnostico_2 has 3903 (91.4%) missing valuesMissing
codigo_da_topografia_cid_o_2 has 3903 (91.4%) missing valuesMissing
codigo_da_morfologia_de_acordo_com_o_cid_o_2 has 3903 (91.4%) missing valuesMissing
estadio_clinico_2 has 3903 (91.4%) missing valuesMissing
grupo_de_estadio_clinico_1 has 195 (4.6%) missing valuesMissing
grupo_de_estadio_clinico_2 has 3959 (92.7%) missing valuesMissing
classificacao_tnm_clinico_t_2 has 3903 (91.4%) missing valuesMissing
classificacao_tnm_clinico_n_2 has 3903 (91.4%) missing valuesMissing
classificacao_tnm_clinico_m_2 has 3903 (91.4%) missing valuesMissing
metastase_ao_diagnostico_cid_o_1_1 has 3600 (84.3%) missing valuesMissing
metastase_ao_diagnostico_cid_o_1_2 has 4233 (99.1%) missing valuesMissing
metastase_ao_diagnostico_cid_o_2_1 has 3898 (91.2%) missing valuesMissing
metastase_ao_diagnostico_cid_o_2_2 has 4257 (99.6%) missing valuesMissing
metastase_ao_diagnostico_cid_o_3_1 has 4097 (95.9%) missing valuesMissing
metastase_ao_diagnostico_cid_o_3_2 has 4266 (99.9%) missing valuesMissing
metastase_ao_diagnostico_cid_o_4_1 has 4205 (98.4%) missing valuesMissing
metastase_ao_diagnostico_cid_o_4_2 has 4270 (> 99.9%) missing valuesMissing
data_do_tratamento_2 has 3928 (91.9%) missing valuesMissing
combinacao_dos_tratamentos_realizados_no_hospital_2 has 3903 (91.4%) missing valuesMissing
ano_do_diagnostico_2 has 3903 (91.4%) missing valuesMissing
lateralidade_do_tumor_2 has 3903 (91.4%) missing valuesMissing
data_de_recidiva_1 has 3023 (70.8%) missing valuesMissing
data_de_recidiva_2 has 4226 (98.9%) missing valuesMissing
tempo_desde_o_diagnostico_ate_a_primeira_recidiv_1 has 3023 (70.8%) missing valuesMissing
tempo_desde_o_diagnostico_ate_a_primeira_recidiv_2 has 4226 (98.9%) missing valuesMissing
local_de_recidiva_a_distancia_metastase_1_cid_o_topografia_1 has 3282 (76.8%) missing valuesMissing
local_de_recidiva_a_distancia_metastase_1_cid_o_topografia_2 has 4231 (99.0%) missing valuesMissing
local_de_recidiva_a_distancia_metastase_2_cid_o_topografia_1 has 3737 (87.5%) missing valuesMissing
local_de_recidiva_a_distancia_metastase_2_cid_o_topografia_2 has 4253 (99.6%) missing valuesMissing
local_de_recidiva_a_distancia_metastase_3_cid_o_topografia_1 has 4013 (93.9%) missing valuesMissing
local_de_recidiva_a_distancia_metastase_3_cid_o_topografia_2 has 4267 (99.9%) missing valuesMissing
local_de_recidiva_a_distancia_metastase_4_cid_o_topografia_1 has 4161 (97.4%) missing valuesMissing
local_de_recidiva_a_distancia_metastase_4_cid_o_topografia_2 has 4271 (> 99.9%) missing valuesMissing
descricao_da_morfologia_de_acordo_com_cid_o_2 has 3903 (91.4%) missing valuesMissing
descricao_da_topografia_2 has 3903 (91.4%) missing valuesMissing
classificacao_tnm_patologico_n_1 has 4086 (95.6%) missing valuesMissing
classificacao_tnm_patologico_n_2 has 4267 (99.9%) missing valuesMissing
classificacao_tnm_patologico_t_1 has 4085 (95.6%) missing valuesMissing
classificacao_tnm_patologico_t_2 has 4267 (99.9%) missing valuesMissing
com_recidiva_a_distancia_2 has 3903 (91.4%) missing valuesMissing
com_recidiva_regional_2 has 3903 (91.4%) missing valuesMissing
com_recidiva_local_2 has 3903 (91.4%) missing valuesMissing
data_da_primeira_consulta_institucional_dt_pci_1 is uniformly distributedUniform
data_da_primeira_consulta_institucional_dt_pci_2 is uniformly distributedUniform
data_do_diagnostico_1 is uniformly distributedUniform
data_do_diagnostico_2 is uniformly distributedUniform
metastase_ao_diagnostico_cid_o_4_2 is uniformly distributedUniform
data_do_tratamento_1 is uniformly distributedUniform
data_do_tratamento_2 is uniformly distributedUniform
data_de_recidiva_1 is uniformly distributedUniform
data_de_recidiva_2 is uniformly distributedUniform
record_id has unique valuesUnique

Reproduction

Analysis started2023-02-28 14:19:40.172842
Analysis finished2023-02-28 14:20:21.324659
Duration41.15 seconds
Software versionydata-profiling vv4.0.0
Download configurationconfig.json

Variables

record_id
Real number (ℝ)

Distinct4272
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean48652.36
Minimum302
Maximum82240
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.5 KiB
2023-02-28T14:20:21.468712image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum302
5-th percentile13992.4
Q131013
median53394
Q365816.75
95-th percentile78668.25
Maximum82240
Range81938
Interquartile range (IQR)34803.75

Descriptive statistics

Standard deviation20659.52
Coefficient of variation (CV)0.4246355
Kurtosis-0.99374558
Mean48652.36
Median Absolute Deviation (MAD)16732
Skewness-0.29501895
Sum2.0784288 × 108
Variance4.2681575 × 108
MonotonicityStrictly increasing
2023-02-28T14:20:21.689714image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
302 1
 
< 0.1%
60912 1
 
< 0.1%
60757 1
 
< 0.1%
60774 1
 
< 0.1%
60777 1
 
< 0.1%
60799 1
 
< 0.1%
60815 1
 
< 0.1%
60825 1
 
< 0.1%
60826 1
 
< 0.1%
60840 1
 
< 0.1%
Other values (4262) 4262
99.8%
ValueCountFrequency (%)
302 1
< 0.1%
710 1
< 0.1%
752 1
< 0.1%
1367 1
< 0.1%
1589 1
< 0.1%
1705 1
< 0.1%
1843 1
< 0.1%
1873 1
< 0.1%
1898 1
< 0.1%
1960 1
< 0.1%
ValueCountFrequency (%)
82240 1
< 0.1%
82205 1
< 0.1%
82131 1
< 0.1%
82124 1
< 0.1%
82123 1
< 0.1%
82122 1
< 0.1%
82118 1
< 0.1%
82112 1
< 0.1%
82111 1
< 0.1%
82100 1
< 0.1%
Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.5 KiB
Registro De Tumores
4272 

Length

Max length19
Median length19
Mean length19
Min length19

Characters and Unicode

Total characters81168
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRegistro De Tumores
2nd rowRegistro De Tumores
3rd rowRegistro De Tumores
4th rowRegistro De Tumores
5th rowRegistro De Tumores

Common Values

ValueCountFrequency (%)
Registro De Tumores 4272
100.0%

Length

2023-02-28T14:20:21.893512image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:22.074312image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
registro 4272
33.3%
de 4272
33.3%
tumores 4272
33.3%

Most occurring characters

ValueCountFrequency (%)
e 12816
15.8%
s 8544
10.5%
r 8544
10.5%
o 8544
10.5%
8544
10.5%
R 4272
 
5.3%
g 4272
 
5.3%
i 4272
 
5.3%
t 4272
 
5.3%
D 4272
 
5.3%
Other values (3) 12816
15.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 59808
73.7%
Uppercase Letter 12816
 
15.8%
Space Separator 8544
 
10.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 12816
21.4%
s 8544
14.3%
r 8544
14.3%
o 8544
14.3%
g 4272
 
7.1%
i 4272
 
7.1%
t 4272
 
7.1%
u 4272
 
7.1%
m 4272
 
7.1%
Uppercase Letter
ValueCountFrequency (%)
R 4272
33.3%
D 4272
33.3%
T 4272
33.3%
Space Separator
ValueCountFrequency (%)
8544
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 72624
89.5%
Common 8544
 
10.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 12816
17.6%
s 8544
11.8%
r 8544
11.8%
o 8544
11.8%
R 4272
 
5.9%
g 4272
 
5.9%
i 4272
 
5.9%
t 4272
 
5.9%
D 4272
 
5.9%
T 4272
 
5.9%
Other values (2) 8544
11.8%
Common
ValueCountFrequency (%)
8544
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 81168
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 12816
15.8%
s 8544
10.5%
r 8544
10.5%
o 8544
10.5%
8544
10.5%
R 4272
 
5.3%
g 4272
 
5.3%
i 4272
 
5.3%
t 4272
 
5.3%
D 4272
 
5.3%
Other values (3) 12816
15.8%

repeat_instrument_2
Categorical

CONSTANT  MISSING 

Distinct1
Distinct (%)0.3%
Missing3903
Missing (%)91.4%
Memory size33.5 KiB
Registro De Tumores
369 

Length

Max length19
Median length19
Mean length19
Min length19

Characters and Unicode

Total characters7011
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRegistro De Tumores
2nd rowRegistro De Tumores
3rd rowRegistro De Tumores
4th rowRegistro De Tumores
5th rowRegistro De Tumores

Common Values

ValueCountFrequency (%)
Registro De Tumores 369
 
8.6%
(Missing) 3903
91.4%

Length

2023-02-28T14:20:22.227929image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:22.411660image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
registro 369
33.3%
de 369
33.3%
tumores 369
33.3%

Most occurring characters

ValueCountFrequency (%)
e 1107
15.8%
s 738
10.5%
r 738
10.5%
o 738
10.5%
738
10.5%
R 369
 
5.3%
g 369
 
5.3%
i 369
 
5.3%
t 369
 
5.3%
D 369
 
5.3%
Other values (3) 1107
15.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5166
73.7%
Uppercase Letter 1107
 
15.8%
Space Separator 738
 
10.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1107
21.4%
s 738
14.3%
r 738
14.3%
o 738
14.3%
g 369
 
7.1%
i 369
 
7.1%
t 369
 
7.1%
u 369
 
7.1%
m 369
 
7.1%
Uppercase Letter
ValueCountFrequency (%)
R 369
33.3%
D 369
33.3%
T 369
33.3%
Space Separator
ValueCountFrequency (%)
738
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6273
89.5%
Common 738
 
10.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1107
17.6%
s 738
11.8%
r 738
11.8%
o 738
11.8%
R 369
 
5.9%
g 369
 
5.9%
i 369
 
5.9%
t 369
 
5.9%
D 369
 
5.9%
T 369
 
5.9%
Other values (2) 738
11.8%
Common
ValueCountFrequency (%)
738
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7011
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 1107
15.8%
s 738
10.5%
r 738
10.5%
o 738
10.5%
738
10.5%
R 369
 
5.3%
g 369
 
5.3%
i 369
 
5.3%
t 369
 
5.3%
D 369
 
5.3%
Other values (3) 1107
15.8%
Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.5 KiB
1.0
4272 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12816
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0 4272
100.0%

Length

2023-02-28T14:20:22.554004image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:22.727716image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
1.0 4272
100.0%

Most occurring characters

ValueCountFrequency (%)
1 4272
33.3%
. 4272
33.3%
0 4272
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 8544
66.7%
Other Punctuation 4272
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 4272
50.0%
0 4272
50.0%
Other Punctuation
ValueCountFrequency (%)
. 4272
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 12816
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 4272
33.3%
. 4272
33.3%
0 4272
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12816
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 4272
33.3%
. 4272
33.3%
0 4272
33.3%

repeat_instance_2
Categorical

CONSTANT  MISSING 

Distinct1
Distinct (%)0.3%
Missing3903
Missing (%)91.4%
Memory size33.5 KiB
2.0
369 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1107
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row2.0
3rd row2.0
4th row2.0
5th row2.0

Common Values

ValueCountFrequency (%)
2.0 369
 
8.6%
(Missing) 3903
91.4%

Length

2023-02-28T14:20:22.889719image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:23.073896image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
2.0 369
100.0%

Most occurring characters

ValueCountFrequency (%)
2 369
33.3%
. 369
33.3%
0 369
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 738
66.7%
Other Punctuation 369
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 369
50.0%
0 369
50.0%
Other Punctuation
ValueCountFrequency (%)
. 369
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1107
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 369
33.3%
. 369
33.3%
0 369
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1107
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 369
33.3%
. 369
33.3%
0 369
33.3%

data_da_primeira_consulta_institucional_dt_pci_1
Categorical

HIGH CARDINALITY  UNIFORM 

Distinct2463
Distinct (%)57.7%
Missing0
Missing (%)0.0%
Memory size33.5 KiB
2011-08-14
 
8
2017-05-12
 
7
2017-04-21
 
6
2018-01-02
 
6
2016-04-12
 
6
Other values (2458)
4239 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters42720
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1328 ?
Unique (%)31.1%

Sample

1st row2008-03-22
2nd row2006-11-11
3rd row2007-09-25
4th row2008-02-03
5th row2008-05-15

Common Values

ValueCountFrequency (%)
2011-08-14 8
 
0.2%
2017-05-12 7
 
0.2%
2017-04-21 6
 
0.1%
2018-01-02 6
 
0.1%
2016-04-12 6
 
0.1%
2014-11-15 6
 
0.1%
2017-07-30 6
 
0.1%
2017-06-10 6
 
0.1%
2015-09-30 6
 
0.1%
2016-06-09 6
 
0.1%
Other values (2453) 4209
98.5%

Length

2023-02-28T14:20:23.209706image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2011-08-14 8
 
0.2%
2017-05-12 7
 
0.2%
2015-09-30 6
 
0.1%
2017-11-07 6
 
0.1%
2016-09-01 6
 
0.1%
2016-02-18 6
 
0.1%
2015-08-29 6
 
0.1%
2016-06-09 6
 
0.1%
2016-03-26 6
 
0.1%
2017-06-10 6
 
0.1%
Other values (2453) 4209
98.5%

Most occurring characters

ValueCountFrequency (%)
0 10005
23.4%
- 8544
20.0%
1 8206
19.2%
2 7289
17.1%
5 1470
 
3.4%
6 1441
 
3.4%
7 1416
 
3.3%
3 1381
 
3.2%
4 1104
 
2.6%
8 1003
 
2.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 34176
80.0%
Dash Punctuation 8544
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 10005
29.3%
1 8206
24.0%
2 7289
21.3%
5 1470
 
4.3%
6 1441
 
4.2%
7 1416
 
4.1%
3 1381
 
4.0%
4 1104
 
3.2%
8 1003
 
2.9%
9 861
 
2.5%
Dash Punctuation
ValueCountFrequency (%)
- 8544
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 42720
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 10005
23.4%
- 8544
20.0%
1 8206
19.2%
2 7289
17.1%
5 1470
 
3.4%
6 1441
 
3.4%
7 1416
 
3.3%
3 1381
 
3.2%
4 1104
 
2.6%
8 1003
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 42720
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 10005
23.4%
- 8544
20.0%
1 8206
19.2%
2 7289
17.1%
5 1470
 
3.4%
6 1441
 
3.4%
7 1416
 
3.3%
3 1381
 
3.2%
4 1104
 
2.6%
8 1003
 
2.3%

data_da_primeira_consulta_institucional_dt_pci_2
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct349
Distinct (%)94.6%
Missing3903
Missing (%)91.4%
Memory size33.5 KiB
2017-08-18
 
3
2017-10-03
 
2
2017-07-24
 
2
2016-04-24
 
2
2017-10-20
 
2
Other values (344)
358 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters3690
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique330 ?
Unique (%)89.4%

Sample

1st row2014-05-12
2nd row2009-06-29
3rd row2016-04-11
4th row2010-07-24
5th row2007-10-17

Common Values

ValueCountFrequency (%)
2017-08-18 3
 
0.1%
2017-10-03 2
 
< 0.1%
2017-07-24 2
 
< 0.1%
2016-04-24 2
 
< 0.1%
2017-10-20 2
 
< 0.1%
2016-12-25 2
 
< 0.1%
2018-02-07 2
 
< 0.1%
2016-03-26 2
 
< 0.1%
2016-08-27 2
 
< 0.1%
2016-09-28 2
 
< 0.1%
Other values (339) 348
 
8.1%
(Missing) 3903
91.4%

Length

2023-02-28T14:20:23.382916image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2017-08-18 3
 
0.8%
2016-04-22 2
 
0.5%
2017-10-03 2
 
0.5%
2018-06-01 2
 
0.5%
2014-05-12 2
 
0.5%
2016-01-26 2
 
0.5%
2016-04-11 2
 
0.5%
2018-04-16 2
 
0.5%
2016-04-15 2
 
0.5%
2017-12-18 2
 
0.5%
Other values (339) 348
94.3%

Most occurring characters

ValueCountFrequency (%)
0 841
22.8%
- 738
20.0%
1 682
18.5%
2 624
16.9%
7 150
 
4.1%
6 133
 
3.6%
8 116
 
3.1%
4 114
 
3.1%
3 110
 
3.0%
5 99
 
2.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2952
80.0%
Dash Punctuation 738
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 841
28.5%
1 682
23.1%
2 624
21.1%
7 150
 
5.1%
6 133
 
4.5%
8 116
 
3.9%
4 114
 
3.9%
3 110
 
3.7%
5 99
 
3.4%
9 83
 
2.8%
Dash Punctuation
ValueCountFrequency (%)
- 738
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 3690
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 841
22.8%
- 738
20.0%
1 682
18.5%
2 624
16.9%
7 150
 
4.1%
6 133
 
3.6%
8 116
 
3.1%
4 114
 
3.1%
3 110
 
3.0%
5 99
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3690
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 841
22.8%
- 738
20.0%
1 682
18.5%
2 624
16.9%
7 150
 
4.1%
6 133
 
3.6%
8 116
 
3.1%
4 114
 
3.1%
3 110
 
3.0%
5 99
 
2.7%

data_do_diagnostico_1
Categorical

HIGH CARDINALITY  UNIFORM 

Distinct2460
Distinct (%)57.6%
Missing0
Missing (%)0.0%
Memory size33.5 KiB
2011-02-01
 
7
2012-03-17
 
7
2015-08-02
 
7
2016-05-27
 
7
2015-08-13
 
6
Other values (2455)
4238 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters42720
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1312 ?
Unique (%)30.7%

Sample

1st row2008-03-23
2nd row2007-11-11
3rd row2007-12-18
4th row2008-02-06
5th row2008-05-21

Common Values

ValueCountFrequency (%)
2011-02-01 7
 
0.2%
2012-03-17 7
 
0.2%
2015-08-02 7
 
0.2%
2016-05-27 7
 
0.2%
2015-08-13 6
 
0.1%
2015-07-06 6
 
0.1%
2015-06-17 6
 
0.1%
2012-03-29 6
 
0.1%
2015-06-10 6
 
0.1%
2016-07-23 6
 
0.1%
Other values (2450) 4208
98.5%

Length

2023-02-28T14:20:23.546588image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2011-02-01 7
 
0.2%
2015-08-02 7
 
0.2%
2016-05-27 7
 
0.2%
2012-03-17 7
 
0.2%
2015-08-13 6
 
0.1%
2015-07-06 6
 
0.1%
2015-06-17 6
 
0.1%
2012-03-29 6
 
0.1%
2015-06-10 6
 
0.1%
2016-07-23 6
 
0.1%
Other values (2450) 4208
98.5%

Most occurring characters

ValueCountFrequency (%)
0 10021
23.5%
- 8544
20.0%
1 8145
19.1%
2 7255
17.0%
6 1443
 
3.4%
5 1440
 
3.4%
3 1426
 
3.3%
7 1398
 
3.3%
4 1116
 
2.6%
8 968
 
2.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 34176
80.0%
Dash Punctuation 8544
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 10021
29.3%
1 8145
23.8%
2 7255
21.2%
6 1443
 
4.2%
5 1440
 
4.2%
3 1426
 
4.2%
7 1398
 
4.1%
4 1116
 
3.3%
8 968
 
2.8%
9 964
 
2.8%
Dash Punctuation
ValueCountFrequency (%)
- 8544
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 42720
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 10021
23.5%
- 8544
20.0%
1 8145
19.1%
2 7255
17.0%
6 1443
 
3.4%
5 1440
 
3.4%
3 1426
 
3.3%
7 1398
 
3.3%
4 1116
 
2.6%
8 968
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 42720
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 10021
23.5%
- 8544
20.0%
1 8145
19.1%
2 7255
17.0%
6 1443
 
3.4%
5 1440
 
3.4%
3 1426
 
3.3%
7 1398
 
3.3%
4 1116
 
2.6%
8 968
 
2.3%

data_do_diagnostico_2
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct358
Distinct (%)97.0%
Missing3903
Missing (%)91.4%
Memory size33.5 KiB
2017-12-17
 
2
2017-06-10
 
2
2018-07-28
 
2
2017-07-06
 
2
2014-01-28
 
2
Other values (353)
359 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters3690
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique347 ?
Unique (%)94.0%

Sample

1st row2014-05-15
2nd row2009-08-24
3rd row2016-05-04
4th row2008-07-30
5th row2007-12-06

Common Values

ValueCountFrequency (%)
2017-12-17 2
 
< 0.1%
2017-06-10 2
 
< 0.1%
2018-07-28 2
 
< 0.1%
2017-07-06 2
 
< 0.1%
2014-01-28 2
 
< 0.1%
2020-01-09 2
 
< 0.1%
2010-12-10 2
 
< 0.1%
2016-02-17 2
 
< 0.1%
2008-11-19 2
 
< 0.1%
2013-02-23 2
 
< 0.1%
Other values (348) 349
 
8.2%
(Missing) 3903
91.4%

Length

2023-02-28T14:20:23.715347image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2017-12-17 2
 
0.5%
2010-12-10 2
 
0.5%
2017-06-10 2
 
0.5%
2013-02-23 2
 
0.5%
2008-11-19 2
 
0.5%
2016-02-17 2
 
0.5%
2010-09-21 2
 
0.5%
2020-01-09 2
 
0.5%
2014-01-28 2
 
0.5%
2017-07-06 2
 
0.5%
Other values (348) 349
94.6%

Most occurring characters

ValueCountFrequency (%)
0 848
23.0%
- 738
20.0%
1 686
18.6%
2 613
16.6%
7 146
 
4.0%
6 128
 
3.5%
5 119
 
3.2%
3 114
 
3.1%
4 106
 
2.9%
9 101
 
2.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2952
80.0%
Dash Punctuation 738
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 848
28.7%
1 686
23.2%
2 613
20.8%
7 146
 
4.9%
6 128
 
4.3%
5 119
 
4.0%
3 114
 
3.9%
4 106
 
3.6%
9 101
 
3.4%
8 91
 
3.1%
Dash Punctuation
ValueCountFrequency (%)
- 738
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 3690
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 848
23.0%
- 738
20.0%
1 686
18.6%
2 613
16.6%
7 146
 
4.0%
6 128
 
3.5%
5 119
 
3.2%
3 114
 
3.1%
4 106
 
2.9%
9 101
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3690
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 848
23.0%
- 738
20.0%
1 686
18.6%
2 613
16.6%
7 146
 
4.0%
6 128
 
3.5%
5 119
 
3.2%
3 114
 
3.1%
4 106
 
2.9%
9 101
 
2.7%
Distinct13
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size33.5 KiB
C509
1927 
C504
1155 
C502
282 
C505
247 
C508
 
175
Other values (8)
486 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters17088
Distinct characters10
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)0.1%

Sample

1st rowC504
2nd rowC508
3rd rowC509
4th rowC505
5th rowC508

Common Values

ValueCountFrequency (%)
C509 1927
45.1%
C504 1155
27.0%
C502 282
 
6.6%
C505 247
 
5.8%
C508 175
 
4.1%
C503 174
 
4.1%
C500 168
 
3.9%
C501 131
 
3.1%
C506 9
 
0.2%
C049 1
 
< 0.1%
Other values (3) 3
 
0.1%

Length

2023-02-28T14:20:23.897250image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
c509 1927
45.1%
c504 1155
27.0%
c502 282
 
6.6%
c505 247
 
5.8%
c508 175
 
4.1%
c503 174
 
4.1%
c500 168
 
3.9%
c501 131
 
3.1%
c506 9
 
0.2%
c049 1
 
< 0.1%
Other values (3) 3
 
0.1%

Most occurring characters

ValueCountFrequency (%)
5 4516
26.4%
0 4439
26.0%
C 4272
25.0%
9 1930
11.3%
4 1156
 
6.8%
2 283
 
1.7%
8 176
 
1.0%
3 175
 
1.0%
1 132
 
0.8%
6 9
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 12816
75.0%
Uppercase Letter 4272
 
25.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5 4516
35.2%
0 4439
34.6%
9 1930
15.1%
4 1156
 
9.0%
2 283
 
2.2%
8 176
 
1.4%
3 175
 
1.4%
1 132
 
1.0%
6 9
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
C 4272
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 12816
75.0%
Latin 4272
 
25.0%

Most frequent character per script

Common
ValueCountFrequency (%)
5 4516
35.2%
0 4439
34.6%
9 1930
15.1%
4 1156
 
9.0%
2 283
 
2.2%
8 176
 
1.4%
3 175
 
1.4%
1 132
 
1.0%
6 9
 
0.1%
Latin
ValueCountFrequency (%)
C 4272
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 17088
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5 4516
26.4%
0 4439
26.0%
C 4272
25.0%
9 1930
11.3%
4 1156
 
6.8%
2 283
 
1.7%
8 176
 
1.0%
3 175
 
1.0%
1 132
 
0.8%
6 9
 
0.1%

codigo_da_topografia_cid_o_2
Categorical

HIGH CARDINALITY  MISSING 

Distinct72
Distinct (%)19.5%
Missing3903
Missing (%)91.4%
Memory size33.5 KiB
C509
87 
C504
42 
C739
 
17
C502
 
15
C446
 
12
Other values (67)
196 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters1476
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique34 ?
Unique (%)9.2%

Sample

1st rowC539
2nd rowC186
3rd rowC509
4th rowC445
5th rowC447

Common Values

ValueCountFrequency (%)
C509 87
 
2.0%
C504 42
 
1.0%
C739 17
 
0.4%
C502 15
 
0.4%
C446 12
 
0.3%
C649 12
 
0.3%
C503 12
 
0.3%
C508 11
 
0.3%
C505 10
 
0.2%
C443 9
 
0.2%
Other values (62) 142
 
3.3%
(Missing) 3903
91.4%

Length

2023-02-28T14:20:24.079028image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
c509 87
23.6%
c504 42
 
11.4%
c739 17
 
4.6%
c502 15
 
4.1%
c446 12
 
3.3%
c649 12
 
3.3%
c503 12
 
3.3%
c508 11
 
3.0%
c505 10
 
2.7%
c443 9
 
2.4%
Other values (62) 142
38.5%

Most occurring characters

ValueCountFrequency (%)
C 369
25.0%
5 234
15.9%
0 222
15.0%
4 178
12.1%
9 155
10.5%
3 78
 
5.3%
1 71
 
4.8%
6 54
 
3.7%
2 52
 
3.5%
7 36
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1107
75.0%
Uppercase Letter 369
 
25.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5 234
21.1%
0 222
20.1%
4 178
16.1%
9 155
14.0%
3 78
 
7.0%
1 71
 
6.4%
6 54
 
4.9%
2 52
 
4.7%
7 36
 
3.3%
8 27
 
2.4%
Uppercase Letter
ValueCountFrequency (%)
C 369
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1107
75.0%
Latin 369
 
25.0%

Most frequent character per script

Common
ValueCountFrequency (%)
5 234
21.1%
0 222
20.1%
4 178
16.1%
9 155
14.0%
3 78
 
7.0%
1 71
 
6.4%
6 54
 
4.9%
2 52
 
4.7%
7 36
 
3.3%
8 27
 
2.4%
Latin
ValueCountFrequency (%)
C 369
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1476
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 369
25.0%
5 234
15.9%
0 222
15.0%
4 178
12.1%
9 155
10.5%
3 78
 
5.3%
1 71
 
4.8%
6 54
 
3.7%
2 52
 
3.5%
7 36
 
2.4%
Distinct43
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean84952.631
Minimum80103
Maximum97553
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.5 KiB
2023-02-28T14:20:24.267359image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum80103
5-th percentile85003
Q185003
median85003
Q385003
95-th percentile85203
Maximum97553
Range17450
Interquartile range (IQR)0

Descriptive statistics

Standard deviation657.00693
Coefficient of variation (CV)0.0077338032
Kurtosis68.057451
Mean84952.631
Median Absolute Deviation (MAD)0
Skewness-2.4878973
Sum3.6291764 × 108
Variance431658.11
MonotonicityNot monotonic
2023-02-28T14:20:24.468233image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=43)
ValueCountFrequency (%)
85003 3793
88.8%
85203 140
 
3.3%
84803 49
 
1.1%
85753 46
 
1.1%
80503 38
 
0.9%
85002 30
 
0.7%
85033 19
 
0.4%
85503 19
 
0.4%
85233 18
 
0.4%
85223 14
 
0.3%
Other values (33) 106
 
2.5%
ValueCountFrequency (%)
80103 9
 
0.2%
80203 1
 
< 0.1%
80223 1
 
< 0.1%
80333 1
 
< 0.1%
80502 3
 
0.1%
80503 38
0.9%
80703 5
 
0.1%
80713 1
 
< 0.1%
81403 4
 
0.1%
82003 8
 
0.2%
ValueCountFrequency (%)
97553 1
 
< 0.1%
90203 5
 
0.1%
89803 1
 
< 0.1%
88903 1
 
< 0.1%
88323 1
 
< 0.1%
88003 2
 
< 0.1%
85753 46
1.1%
85503 19
0.4%
85433 2
 
< 0.1%
85413 8
 
0.2%
Distinct73
Distinct (%)19.8%
Missing3903
Missing (%)91.4%
Infinite0
Infinite (%)0.0%
Mean84178.667
Minimum80103
Maximum99873
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.5 KiB
2023-02-28T14:20:24.695827image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum80103
5-th percentile80703
Q182113
median85002
Q385003
95-th percentile87785
Maximum99873
Range19770
Interquartile range (IQR)2890

Descriptive statistics

Standard deviation2990.1006
Coefficient of variation (CV)0.035520884
Kurtosis9.1598417
Mean84178.667
Median Absolute Deviation (MAD)501
Skewness2.3002379
Sum31061928
Variance8940701.8
MonotonicityNot monotonic
2023-02-28T14:20:24.950423image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
85003 111
 
2.6%
85002 43
 
1.0%
80703 25
 
0.6%
81403 23
 
0.5%
80973 13
 
0.3%
85203 11
 
0.3%
83123 11
 
0.3%
82113 10
 
0.2%
82603 9
 
0.2%
85503 8
 
0.2%
Other values (63) 105
 
2.5%
(Missing) 3903
91.4%
ValueCountFrequency (%)
80103 2
 
< 0.1%
80413 2
 
< 0.1%
80463 1
 
< 0.1%
80503 3
 
0.1%
80702 4
 
0.1%
80703 25
0.6%
80713 1
 
< 0.1%
80762 1
 
< 0.1%
80772 1
 
< 0.1%
80812 3
 
0.1%
ValueCountFrequency (%)
99873 1
< 0.1%
99203 1
< 0.1%
98663 2
< 0.1%
97323 1
< 0.1%
96993 1
< 0.1%
96803 2
< 0.1%
95403 1
< 0.1%
93913 1
< 0.1%
91813 1
< 0.1%
91203 1
< 0.1%
Distinct14
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size33.5 KiB
IIA
911 
IIIA
722 
IIB
715 
IV
544 
IIIB
460 
Other values (9)
920 

Length

Max length30
Median length5
Mean length2.9824438
Min length1

Characters and Unicode

Total characters12741
Distinct characters27
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowIIA
2nd rowIIIA
3rd rowIIA
4th rowIIA
5th rowIIB

Common Values

ValueCountFrequency (%)
IIA 911
21.3%
IIIA 722
16.9%
IIB 715
16.7%
IV 544
12.7%
IIIB 460
10.8%
IA 408
9.6%
I 240
 
5.6%
IIIC 183
 
4.3%
0 37
 
0.9%
IB 36
 
0.8%
Other values (4) 16
 
0.4%

Length

2023-02-28T14:20:25.634944image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
iia 911
21.2%
iiia 722
16.8%
iib 715
16.6%
iv 544
12.7%
iiib 460
10.7%
ia 408
9.5%
i 240
 
5.6%
iiic 183
 
4.3%
0 37
 
0.9%
ib 36
 
0.8%
Other values (9) 39
 
0.9%

Most occurring characters

ValueCountFrequency (%)
I 8578
67.3%
A 2052
 
16.1%
B 1212
 
9.5%
V 545
 
4.3%
C 183
 
1.4%
0 37
 
0.3%
23
 
0.2%
: 14
 
0.1%
Y 11
 
0.1%
N 11
 
0.1%
Other values (17) 75
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 12595
98.9%
Lowercase Letter 72
 
0.6%
Decimal Number 37
 
0.3%
Space Separator 23
 
0.2%
Other Punctuation 14
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 9
12.5%
o 9
12.5%
n 6
 
8.3%
i 6
 
8.3%
s 6
 
8.3%
r 6
 
8.3%
ã 3
 
4.2%
f 3
 
4.2%
p 3
 
4.2%
í 3
 
4.2%
Other values (6) 18
25.0%
Uppercase Letter
ValueCountFrequency (%)
I 8578
68.1%
A 2052
 
16.3%
B 1212
 
9.6%
V 545
 
4.3%
C 183
 
1.5%
Y 11
 
0.1%
N 11
 
0.1%
X 3
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
0 37
100.0%
Space Separator
ValueCountFrequency (%)
23
100.0%
Other Punctuation
ValueCountFrequency (%)
: 14
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 12667
99.4%
Common 74
 
0.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
I 8578
67.7%
A 2052
 
16.2%
B 1212
 
9.6%
V 545
 
4.3%
C 183
 
1.4%
Y 11
 
0.1%
N 11
 
0.1%
e 9
 
0.1%
o 9
 
0.1%
n 6
 
< 0.1%
Other values (14) 51
 
0.4%
Common
ValueCountFrequency (%)
0 37
50.0%
23
31.1%
: 14
 
18.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12735
> 99.9%
None 6
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
I 8578
67.4%
A 2052
 
16.1%
B 1212
 
9.5%
V 545
 
4.3%
C 183
 
1.4%
0 37
 
0.3%
23
 
0.2%
: 14
 
0.1%
Y 11
 
0.1%
N 11
 
0.1%
Other values (15) 69
 
0.5%
None
ValueCountFrequency (%)
ã 3
50.0%
í 3
50.0%
Distinct21
Distinct (%)5.7%
Missing3903
Missing (%)91.4%
Memory size33.5 KiB
0
60 
IA
59 
I
55 
IIA
45 
IV
32 
Other values (16)
118 

Length

Max length30
Median length5
Mean length2.4634146
Min length1

Characters and Unicode

Total characters909
Distinct characters29
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)1.4%

Sample

1st rowIVB
2nd rowIV
3rd rowIIA
4th rowIIA
5th rowIA

Common Values

ValueCountFrequency (%)
0 60
 
1.4%
IA 59
 
1.4%
I 55
 
1.3%
IIA 45
 
1.1%
IV 32
 
0.7%
Y: NA 16
 
0.4%
IIIB 16
 
0.4%
IIB 15
 
0.4%
III 15
 
0.4%
II 12
 
0.3%
Other values (11) 44
 
1.0%
(Missing) 3903
91.4%

Length

2023-02-28T14:20:25.831362image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0 60
15.1%
ia 59
14.9%
i 55
13.9%
iia 45
11.3%
iv 32
8.1%
y 16
 
4.0%
na 16
 
4.0%
iiib 16
 
4.0%
iib 15
 
3.8%
iii 15
 
3.8%
Other values (16) 68
17.1%

Most occurring characters

ValueCountFrequency (%)
I 452
49.7%
A 136
 
15.0%
0 61
 
6.7%
V 48
 
5.3%
B 48
 
5.3%
28
 
3.1%
: 19
 
2.1%
Y 16
 
1.8%
N 16
 
1.8%
o 9
 
1.0%
Other values (19) 76
 
8.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 727
80.0%
Lowercase Letter 72
 
7.9%
Decimal Number 63
 
6.9%
Space Separator 28
 
3.1%
Other Punctuation 19
 
2.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 9
12.5%
e 9
12.5%
n 6
 
8.3%
i 6
 
8.3%
r 6
 
8.3%
s 6
 
8.3%
m 3
 
4.2%
l 3
 
4.2%
t 3
 
4.2%
d 3
 
4.2%
Other values (6) 18
25.0%
Uppercase Letter
ValueCountFrequency (%)
I 452
62.2%
A 136
 
18.7%
V 48
 
6.6%
B 48
 
6.6%
Y 16
 
2.2%
N 16
 
2.2%
C 8
 
1.1%
X 3
 
0.4%
Decimal Number
ValueCountFrequency (%)
0 61
96.8%
1 1
 
1.6%
2 1
 
1.6%
Space Separator
ValueCountFrequency (%)
28
100.0%
Other Punctuation
ValueCountFrequency (%)
: 19
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 799
87.9%
Common 110
 
12.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
I 452
56.6%
A 136
 
17.0%
V 48
 
6.0%
B 48
 
6.0%
Y 16
 
2.0%
N 16
 
2.0%
o 9
 
1.1%
e 9
 
1.1%
C 8
 
1.0%
n 6
 
0.8%
Other values (14) 51
 
6.4%
Common
ValueCountFrequency (%)
0 61
55.5%
28
25.5%
: 19
 
17.3%
1 1
 
0.9%
2 1
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 903
99.3%
None 6
 
0.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
I 452
50.1%
A 136
 
15.1%
0 61
 
6.8%
V 48
 
5.3%
B 48
 
5.3%
28
 
3.1%
: 19
 
2.1%
Y 16
 
1.8%
N 16
 
1.8%
o 9
 
1.0%
Other values (17) 70
 
7.8%
None
ValueCountFrequency (%)
í 3
50.0%
ã 3
50.0%
Distinct7
Distinct (%)0.2%
Missing195
Missing (%)4.6%
Memory size33.5 KiB
II
1570 
III
1291 
I
663 
IV
513 
0
 
28
Other values (2)
 
12

Length

Max length31
Median length2
Mean length2.1751288
Min length1

Characters and Unicode

Total characters8868
Distinct characters23
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowII
2nd rowIII
3rd rowII
4th rowII
5th rowII

Common Values

ValueCountFrequency (%)
II 1570
36.8%
III 1291
30.2%
I 663
15.5%
IV 513
 
12.0%
0 28
 
0.7%
Y: Na 9
 
0.2%
X - nao foi possivel determinar 3
 
0.1%
(Missing) 195
 
4.6%

Length

2023-02-28T14:20:26.043558image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:26.262813image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
ii 1570
38.3%
iii 1291
31.5%
i 663
16.2%
iv 513
 
12.5%
0 28
 
0.7%
y 9
 
0.2%
na 9
 
0.2%
x 3
 
0.1%
3
 
0.1%
nao 3
 
0.1%
Other values (3) 9
 
0.2%

Most occurring characters

ValueCountFrequency (%)
I 8189
92.3%
V 513
 
5.8%
0 28
 
0.3%
24
 
0.3%
a 15
 
0.2%
e 9
 
0.1%
i 9
 
0.1%
o 9
 
0.1%
N 9
 
0.1%
: 9
 
0.1%
Other values (13) 54
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 8723
98.4%
Lowercase Letter 81
 
0.9%
Decimal Number 28
 
0.3%
Space Separator 24
 
0.3%
Other Punctuation 9
 
0.1%
Dash Punctuation 3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 15
18.5%
e 9
11.1%
i 9
11.1%
o 9
11.1%
n 6
 
7.4%
s 6
 
7.4%
r 6
 
7.4%
f 3
 
3.7%
p 3
 
3.7%
v 3
 
3.7%
Other values (4) 12
14.8%
Uppercase Letter
ValueCountFrequency (%)
I 8189
93.9%
V 513
 
5.9%
N 9
 
0.1%
Y 9
 
0.1%
X 3
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
0 28
100.0%
Space Separator
ValueCountFrequency (%)
24
100.0%
Other Punctuation
ValueCountFrequency (%)
: 9
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 8804
99.3%
Common 64
 
0.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
I 8189
93.0%
V 513
 
5.8%
a 15
 
0.2%
e 9
 
0.1%
i 9
 
0.1%
o 9
 
0.1%
N 9
 
0.1%
Y 9
 
0.1%
n 6
 
0.1%
s 6
 
0.1%
Other values (9) 30
 
0.3%
Common
ValueCountFrequency (%)
0 28
43.8%
24
37.5%
: 9
 
14.1%
- 3
 
4.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8868
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
I 8189
92.3%
V 513
 
5.8%
0 28
 
0.3%
24
 
0.3%
a 15
 
0.2%
e 9
 
0.1%
i 9
 
0.1%
o 9
 
0.1%
N 9
 
0.1%
: 9
 
0.1%
Other values (13) 54
 
0.6%
Distinct7
Distinct (%)2.2%
Missing3959
Missing (%)92.7%
Memory size33.5 KiB
I
108 
II
63 
0
49 
III
40 
IV
37 
Other values (2)
16 

Length

Max length31
Median length1
Mean length1.9456869
Min length1

Characters and Unicode

Total characters609
Distinct characters23
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIV
2nd rowIV
3rd rowII
4th rowI
5th row0

Common Values

ValueCountFrequency (%)
I 108
 
2.5%
II 63
 
1.5%
0 49
 
1.1%
III 40
 
0.9%
IV 37
 
0.9%
Y: Na 14
 
0.3%
X - nao foi possivel determinar 2
 
< 0.1%
(Missing) 3959
92.7%

Length

2023-02-28T14:20:26.437024image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:26.658761image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
i 108
32.0%
ii 63
18.7%
0 49
14.5%
iii 40
 
11.9%
iv 37
 
11.0%
y 14
 
4.2%
na 14
 
4.2%
x 2
 
0.6%
2
 
0.6%
nao 2
 
0.6%
Other values (3) 6
 
1.8%

Most occurring characters

ValueCountFrequency (%)
I 391
64.2%
0 49
 
8.0%
V 37
 
6.1%
24
 
3.9%
a 18
 
3.0%
Y 14
 
2.3%
: 14
 
2.3%
N 14
 
2.3%
e 6
 
1.0%
i 6
 
1.0%
Other values (13) 36
 
5.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 458
75.2%
Lowercase Letter 62
 
10.2%
Decimal Number 49
 
8.0%
Space Separator 24
 
3.9%
Other Punctuation 14
 
2.3%
Dash Punctuation 2
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 18
29.0%
e 6
 
9.7%
i 6
 
9.7%
o 6
 
9.7%
n 4
 
6.5%
s 4
 
6.5%
r 4
 
6.5%
f 2
 
3.2%
p 2
 
3.2%
v 2
 
3.2%
Other values (4) 8
12.9%
Uppercase Letter
ValueCountFrequency (%)
I 391
85.4%
V 37
 
8.1%
Y 14
 
3.1%
N 14
 
3.1%
X 2
 
0.4%
Decimal Number
ValueCountFrequency (%)
0 49
100.0%
Space Separator
ValueCountFrequency (%)
24
100.0%
Other Punctuation
ValueCountFrequency (%)
: 14
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 520
85.4%
Common 89
 
14.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
I 391
75.2%
V 37
 
7.1%
a 18
 
3.5%
Y 14
 
2.7%
N 14
 
2.7%
e 6
 
1.2%
i 6
 
1.2%
o 6
 
1.2%
n 4
 
0.8%
s 4
 
0.8%
Other values (9) 20
 
3.8%
Common
ValueCountFrequency (%)
0 49
55.1%
24
27.0%
: 14
 
15.7%
- 2
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 609
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
I 391
64.2%
0 49
 
8.0%
V 37
 
6.1%
24
 
3.9%
a 18
 
3.0%
Y 14
 
2.3%
: 14
 
2.3%
N 14
 
2.3%
e 6
 
1.0%
i 6
 
1.0%
Other values (13) 36
 
5.9%
Distinct18
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size33.5 KiB
2
1609 
3
854 
1C
579 
4B
489 
4D
 
135
Other values (13)
606 

Length

Max length31
Median length1
Mean length1.5414326
Min length1

Characters and Unicode

Total characters6585
Distinct characters33
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row2
2nd row3
3rd row2
4th row1
5th row2

Common Values

ValueCountFrequency (%)
2 1609
37.7%
3 854
20.0%
1C 579
 
13.6%
4B 489
 
11.4%
4D 135
 
3.2%
1B 133
 
3.1%
4 128
 
3.0%
1 126
 
2.9%
1A 73
 
1.7%
4C 42
 
1.0%
Other values (8) 104
 
2.4%

Length

2023-02-28T14:20:26.849764image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2 1609
36.5%
3 854
19.4%
1c 579
 
13.2%
4b 489
 
11.1%
4d 135
 
3.1%
1b 133
 
3.0%
4 128
 
2.9%
1 126
 
2.9%
1a 73
 
1.7%
4c 42
 
1.0%
Other values (14) 235
 
5.3%

Most occurring characters

ValueCountFrequency (%)
2 1609
24.4%
1 917
13.9%
3 854
13.0%
4 817
12.4%
C 637
 
9.7%
B 622
 
9.4%
D 144
 
2.2%
131
 
2.0%
A 96
 
1.5%
o 72
 
1.1%
Other values (23) 686
10.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 4200
63.8%
Uppercase Letter 1632
 
24.8%
Lowercase Letter 587
 
8.9%
Space Separator 131
 
2.0%
Dash Punctuation 24
 
0.4%
Other Punctuation 11
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 72
12.3%
i 72
12.3%
e 72
12.3%
a 59
10.1%
r 48
8.2%
n 48
8.2%
s 48
8.2%
m 24
 
4.1%
l 24
 
4.1%
t 24
 
4.1%
Other values (4) 96
16.4%
Uppercase Letter
ValueCountFrequency (%)
C 637
39.0%
B 622
38.1%
D 144
 
8.8%
A 96
 
5.9%
I 43
 
2.6%
S 37
 
2.3%
X 24
 
1.5%
Y 11
 
0.7%
N 11
 
0.7%
M 6
 
0.4%
Decimal Number
ValueCountFrequency (%)
2 1609
38.3%
1 917
21.8%
3 854
20.3%
4 817
19.5%
0 3
 
0.1%
Space Separator
ValueCountFrequency (%)
131
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 24
100.0%
Other Punctuation
ValueCountFrequency (%)
: 11
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 4366
66.3%
Latin 2219
33.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 637
28.7%
B 622
28.0%
D 144
 
6.5%
A 96
 
4.3%
o 72
 
3.2%
i 72
 
3.2%
e 72
 
3.2%
a 59
 
2.7%
r 48
 
2.2%
n 48
 
2.2%
Other values (15) 349
15.7%
Common
ValueCountFrequency (%)
2 1609
36.9%
1 917
21.0%
3 854
19.6%
4 817
18.7%
131
 
3.0%
- 24
 
0.5%
: 11
 
0.3%
0 3
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6585
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 1609
24.4%
1 917
13.9%
3 854
13.0%
4 817
12.4%
C 637
 
9.7%
B 622
 
9.4%
D 144
 
2.2%
131
 
2.0%
A 96
 
1.5%
o 72
 
1.1%
Other values (23) 686
10.4%
Distinct19
Distinct (%)5.1%
Missing3903
Missing (%)91.4%
Memory size33.5 KiB
2
60 
1
43 
IS
38 
3
36 
1C
30 
Other values (14)
162 

Length

Max length31
Median length5
Mean length3.7940379
Min length1

Characters and Unicode

Total characters1400
Distinct characters30
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.3%

Sample

1st rowX - nao foi possivel determinar
2nd row3
3rd row2
4th row3A
5th row1A

Common Values

ValueCountFrequency (%)
2 60
 
1.4%
1 43
 
1.0%
IS 38
 
0.9%
3 36
 
0.8%
1C 30
 
0.7%
1B 27
 
0.6%
X - nao foi possivel determinar 25
 
0.6%
1A 23
 
0.5%
CDIS 21
 
0.5%
Y: Na 16
 
0.4%
Other values (9) 50
 
1.2%
(Missing) 3903
91.4%

Length

2023-02-28T14:20:27.043942image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2 60
 
11.8%
1 43
 
8.4%
is 38
 
7.5%
3 36
 
7.1%
1c 30
 
5.9%
1b 27
 
5.3%
x 25
 
4.9%
25
 
4.9%
nao 25
 
4.9%
foi 25
 
4.9%
Other values (15) 176
34.5%

Most occurring characters

ValueCountFrequency (%)
141
 
10.1%
1 123
 
8.8%
o 75
 
5.4%
e 75
 
5.4%
i 75
 
5.4%
2 68
 
4.9%
a 66
 
4.7%
I 59
 
4.2%
S 59
 
4.2%
C 51
 
3.6%
Other values (20) 608
43.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 616
44.0%
Uppercase Letter 334
23.9%
Decimal Number 268
19.1%
Space Separator 141
 
10.1%
Dash Punctuation 25
 
1.8%
Other Punctuation 16
 
1.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 75
12.2%
e 75
12.2%
i 75
12.2%
a 66
10.7%
s 50
8.1%
r 50
8.1%
n 50
8.1%
v 25
 
4.1%
f 25
 
4.1%
l 25
 
4.1%
Other values (4) 100
16.2%
Uppercase Letter
ValueCountFrequency (%)
I 59
17.7%
S 59
17.7%
C 51
15.3%
B 41
12.3%
A 41
12.3%
D 26
7.8%
X 25
7.5%
Y 16
 
4.8%
N 16
 
4.8%
Decimal Number
ValueCountFrequency (%)
1 123
45.9%
2 68
25.4%
3 43
 
16.0%
4 34
 
12.7%
Space Separator
ValueCountFrequency (%)
141
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 25
100.0%
Other Punctuation
ValueCountFrequency (%)
: 16
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 950
67.9%
Common 450
32.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 75
 
7.9%
e 75
 
7.9%
i 75
 
7.9%
a 66
 
6.9%
I 59
 
6.2%
S 59
 
6.2%
C 51
 
5.4%
s 50
 
5.3%
r 50
 
5.3%
n 50
 
5.3%
Other values (13) 340
35.8%
Common
ValueCountFrequency (%)
141
31.3%
1 123
27.3%
2 68
15.1%
3 43
 
9.6%
4 34
 
7.6%
- 25
 
5.6%
: 16
 
3.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1400
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
141
 
10.1%
1 123
 
8.8%
o 75
 
5.4%
e 75
 
5.4%
i 75
 
5.4%
2 68
 
4.9%
a 66
 
4.7%
I 59
 
4.2%
S 59
 
4.2%
C 51
 
3.6%
Other values (20) 608
43.4%
Distinct11
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size33.5 KiB
0
1778 
1
1395 
2
510 
2A
223 
3
 
139
Other values (6)
227 

Length

Max length31
Median length1
Mean length1.3167135
Min length1

Characters and Unicode

Total characters5625
Distinct characters27
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row0
4th row1
5th row1

Common Values

ValueCountFrequency (%)
0 1778
41.6%
1 1395
32.7%
2 510
 
11.9%
2A 223
 
5.2%
3 139
 
3.3%
3A 83
 
1.9%
3C 51
 
1.2%
3B 31
 
0.7%
X - nao foi possivel determinar 30
 
0.7%
2B 21
 
0.5%

Length

2023-02-28T14:20:27.226630image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0 1778
40.1%
1 1395
31.5%
2 510
 
11.5%
2a 223
 
5.0%
3 139
 
3.1%
3a 83
 
1.9%
3c 51
 
1.2%
3b 31
 
0.7%
foi 30
 
0.7%
determinar 30
 
0.7%
Other values (7) 163
 
3.7%

Most occurring characters

ValueCountFrequency (%)
0 1778
31.6%
1 1395
24.8%
2 754
13.4%
A 306
 
5.4%
3 304
 
5.4%
161
 
2.9%
i 90
 
1.6%
o 90
 
1.6%
e 90
 
1.6%
a 71
 
1.3%
Other values (17) 586
 
10.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 4231
75.2%
Lowercase Letter 731
 
13.0%
Uppercase Letter 461
 
8.2%
Space Separator 161
 
2.9%
Dash Punctuation 30
 
0.5%
Other Punctuation 11
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 90
12.3%
o 90
12.3%
e 90
12.3%
a 71
9.7%
n 60
8.2%
s 60
8.2%
r 60
8.2%
l 30
 
4.1%
m 30
 
4.1%
t 30
 
4.1%
Other values (4) 120
16.4%
Uppercase Letter
ValueCountFrequency (%)
A 306
66.4%
B 52
 
11.3%
C 51
 
11.1%
X 30
 
6.5%
Y 11
 
2.4%
N 11
 
2.4%
Decimal Number
ValueCountFrequency (%)
0 1778
42.0%
1 1395
33.0%
2 754
17.8%
3 304
 
7.2%
Space Separator
ValueCountFrequency (%)
161
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 30
100.0%
Other Punctuation
ValueCountFrequency (%)
: 11
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 4433
78.8%
Latin 1192
 
21.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 306
25.7%
i 90
 
7.6%
o 90
 
7.6%
e 90
 
7.6%
a 71
 
6.0%
n 60
 
5.0%
s 60
 
5.0%
r 60
 
5.0%
B 52
 
4.4%
C 51
 
4.3%
Other values (10) 262
22.0%
Common
ValueCountFrequency (%)
0 1778
40.1%
1 1395
31.5%
2 754
17.0%
3 304
 
6.9%
161
 
3.6%
- 30
 
0.7%
: 11
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5625
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1778
31.6%
1 1395
24.8%
2 754
13.4%
A 306
 
5.4%
3 304
 
5.4%
161
 
2.9%
i 90
 
1.6%
o 90
 
1.6%
e 90
 
1.6%
a 71
 
1.3%
Other values (17) 586
 
10.4%

classificacao_tnm_clinico_n_2
Categorical

IMBALANCE  MISSING 

Distinct13
Distinct (%)3.5%
Missing3903
Missing (%)91.4%
Memory size33.5 KiB
0
258 
1
40 
X - nao foi possivel determinar
26 
Y: Na
 
16
2
 
8
Other values (8)
 
21

Length

Max length31
Median length1
Mean length3.3279133
Min length1

Characters and Unicode

Total characters1228
Distinct characters27
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.5%

Sample

1st rowX - nao foi possivel determinar
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 258
 
6.0%
1 40
 
0.9%
X - nao foi possivel determinar 26
 
0.6%
Y: Na 16
 
0.4%
2 8
 
0.2%
3 6
 
0.1%
2A 4
 
0.1%
1A 3
 
0.1%
3A 2
 
< 0.1%
1B 2
 
< 0.1%
Other values (3) 4
 
0.1%
(Missing) 3903
91.4%

Length

2023-02-28T14:20:27.412656image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0 258
50.1%
1 40
 
7.8%
x 26
 
5.0%
26
 
5.0%
nao 26
 
5.0%
foi 26
 
5.0%
possivel 26
 
5.0%
determinar 26
 
5.0%
na 16
 
3.1%
y 16
 
3.1%
Other values (9) 29
 
5.6%

Most occurring characters

ValueCountFrequency (%)
0 258
21.0%
146
11.9%
o 78
 
6.4%
i 78
 
6.4%
e 78
 
6.4%
a 68
 
5.5%
n 52
 
4.2%
r 52
 
4.2%
s 52
 
4.2%
1 45
 
3.7%
Other values (17) 321
26.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 640
52.1%
Decimal Number 327
26.6%
Space Separator 146
 
11.9%
Uppercase Letter 73
 
5.9%
Dash Punctuation 26
 
2.1%
Other Punctuation 16
 
1.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 78
12.2%
i 78
12.2%
e 78
12.2%
a 68
10.6%
n 52
8.1%
r 52
8.1%
s 52
8.1%
m 26
 
4.1%
d 26
 
4.1%
l 26
 
4.1%
Other values (4) 104
16.2%
Uppercase Letter
ValueCountFrequency (%)
X 26
35.6%
Y 16
21.9%
N 16
21.9%
A 9
 
12.3%
B 5
 
6.8%
C 1
 
1.4%
Decimal Number
ValueCountFrequency (%)
0 258
78.9%
1 45
 
13.8%
2 14
 
4.3%
3 10
 
3.1%
Space Separator
ValueCountFrequency (%)
146
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 26
100.0%
Other Punctuation
ValueCountFrequency (%)
: 16
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 713
58.1%
Common 515
41.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 78
10.9%
i 78
10.9%
e 78
10.9%
a 68
 
9.5%
n 52
 
7.3%
r 52
 
7.3%
s 52
 
7.3%
m 26
 
3.6%
d 26
 
3.6%
l 26
 
3.6%
Other values (10) 177
24.8%
Common
ValueCountFrequency (%)
0 258
50.1%
146
28.3%
1 45
 
8.7%
- 26
 
5.0%
: 16
 
3.1%
2 14
 
2.7%
3 10
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1228
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 258
21.0%
146
11.9%
o 78
 
6.4%
i 78
 
6.4%
e 78
 
6.4%
a 68
 
5.5%
n 52
 
4.2%
r 52
 
4.2%
s 52
 
4.2%
1 45
 
3.7%
Other values (17) 321
26.1%
Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size33.5 KiB
0
3712 
1
546 
Y: Na
 
11
X - nao foi possivel determinar
 
3

Length

Max length31
Median length1
Mean length1.031367
Min length1

Characters and Unicode

Total characters4406
Distinct characters22
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 3712
86.9%
1 546
 
12.8%
Y: Na 11
 
0.3%
X - nao foi possivel determinar 3
 
0.1%

Length

2023-02-28T14:20:27.597728image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:27.807621image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
0 3712
86.4%
1 546
 
12.7%
y 11
 
0.3%
na 11
 
0.3%
x 3
 
0.1%
3
 
0.1%
nao 3
 
0.1%
foi 3
 
0.1%
possivel 3
 
0.1%
determinar 3
 
0.1%

Most occurring characters

ValueCountFrequency (%)
0 3712
84.2%
1 546
 
12.4%
26
 
0.6%
a 17
 
0.4%
Y 11
 
0.2%
: 11
 
0.2%
N 11
 
0.2%
e 9
 
0.2%
o 9
 
0.2%
i 9
 
0.2%
Other values (12) 45
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 4258
96.6%
Lowercase Letter 83
 
1.9%
Space Separator 26
 
0.6%
Uppercase Letter 25
 
0.6%
Other Punctuation 11
 
0.2%
Dash Punctuation 3
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 17
20.5%
e 9
10.8%
o 9
10.8%
i 9
10.8%
r 6
 
7.2%
n 6
 
7.2%
s 6
 
7.2%
t 3
 
3.6%
d 3
 
3.6%
l 3
 
3.6%
Other values (4) 12
14.5%
Uppercase Letter
ValueCountFrequency (%)
Y 11
44.0%
N 11
44.0%
X 3
 
12.0%
Decimal Number
ValueCountFrequency (%)
0 3712
87.2%
1 546
 
12.8%
Space Separator
ValueCountFrequency (%)
26
100.0%
Other Punctuation
ValueCountFrequency (%)
: 11
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 4298
97.5%
Latin 108
 
2.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 17
15.7%
Y 11
10.2%
N 11
10.2%
e 9
8.3%
o 9
8.3%
i 9
8.3%
r 6
 
5.6%
n 6
 
5.6%
s 6
 
5.6%
t 3
 
2.8%
Other values (7) 21
19.4%
Common
ValueCountFrequency (%)
0 3712
86.4%
1 546
 
12.7%
26
 
0.6%
: 11
 
0.3%
- 3
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4406
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 3712
84.2%
1 546
 
12.4%
26
 
0.6%
a 17
 
0.4%
Y 11
 
0.2%
: 11
 
0.2%
N 11
 
0.2%
e 9
 
0.2%
o 9
 
0.2%
i 9
 
0.2%
Other values (12) 45
 
1.0%

classificacao_tnm_clinico_m_2
Categorical

IMBALANCE  MISSING 

Distinct6
Distinct (%)1.6%
Missing3903
Missing (%)91.4%
Memory size33.5 KiB
0
311 
1
 
30
Y: Na
 
16
1B
 
7
X - nao foi possivel determinar
 
3

Length

Max length31
Median length1
Mean length1.4417344
Min length1

Characters and Unicode

Total characters532
Distinct characters24
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 311
 
7.3%
1 30
 
0.7%
Y: Na 16
 
0.4%
1B 7
 
0.2%
X - nao foi possivel determinar 3
 
0.1%
1A 2
 
< 0.1%
(Missing) 3903
91.4%

Length

2023-02-28T14:20:27.976592image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:28.192530image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
0 311
77.8%
1 30
 
7.5%
y 16
 
4.0%
na 16
 
4.0%
1b 7
 
1.8%
x 3
 
0.8%
3
 
0.8%
nao 3
 
0.8%
foi 3
 
0.8%
possivel 3
 
0.8%
Other values (2) 5
 
1.2%

Most occurring characters

ValueCountFrequency (%)
0 311
58.5%
1 39
 
7.3%
31
 
5.8%
a 22
 
4.1%
Y 16
 
3.0%
: 16
 
3.0%
N 16
 
3.0%
o 9
 
1.7%
i 9
 
1.7%
e 9
 
1.7%
Other values (14) 54
 
10.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 350
65.8%
Lowercase Letter 88
 
16.5%
Uppercase Letter 44
 
8.3%
Space Separator 31
 
5.8%
Other Punctuation 16
 
3.0%
Dash Punctuation 3
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 22
25.0%
o 9
10.2%
i 9
10.2%
e 9
10.2%
n 6
 
6.8%
s 6
 
6.8%
r 6
 
6.8%
l 3
 
3.4%
m 3
 
3.4%
t 3
 
3.4%
Other values (4) 12
13.6%
Uppercase Letter
ValueCountFrequency (%)
Y 16
36.4%
N 16
36.4%
B 7
15.9%
X 3
 
6.8%
A 2
 
4.5%
Decimal Number
ValueCountFrequency (%)
0 311
88.9%
1 39
 
11.1%
Space Separator
ValueCountFrequency (%)
31
100.0%
Other Punctuation
ValueCountFrequency (%)
: 16
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 400
75.2%
Latin 132
 
24.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 22
16.7%
Y 16
12.1%
N 16
12.1%
o 9
 
6.8%
i 9
 
6.8%
e 9
 
6.8%
B 7
 
5.3%
n 6
 
4.5%
s 6
 
4.5%
r 6
 
4.5%
Other values (9) 26
19.7%
Common
ValueCountFrequency (%)
0 311
77.8%
1 39
 
9.8%
31
 
7.8%
: 16
 
4.0%
- 3
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 532
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 311
58.5%
1 39
 
7.3%
31
 
5.8%
a 22
 
4.1%
Y 16
 
3.0%
: 16
 
3.0%
N 16
 
3.0%
o 9
 
1.7%
i 9
 
1.7%
e 9
 
1.7%
Other values (14) 54
 
10.2%
Distinct14
Distinct (%)2.1%
Missing3600
Missing (%)84.3%
Memory size33.5 KiB
C34 - Bronquios e Pulmoes
196 
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações
151 
C22 - Fígado e Das Vias Biliares Intra-hepáticas
107 
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos
81 
C40 - Ossos e Cartilagens Articulares Dos Membros
81 
Other values (9)
56 

Length

Max length64
Median length59
Mean length45.309524
Min length12

Characters and Unicode

Total characters30448
Distinct characters61
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)0.4%

Sample

1st rowC41 - Ossos e Das Cartilagens Articulares de Outras Localizações
2nd rowC77 - Secundária e Não Especificada Dos Gânglios Linfáticos
3rd rowC34 - Bronquios e Pulmoes
4th rowC77 - Secundária e Não Especificada Dos Gânglios Linfáticos
5th rowC77 - Secundária e Não Especificada Dos Gânglios Linfáticos

Common Values

ValueCountFrequency (%)
C34 - Bronquios e Pulmoes 196
 
4.6%
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações 151
 
3.5%
C22 - Fígado e Das Vias Biliares Intra-hepáticas 107
 
2.5%
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos 81
 
1.9%
C40 - Ossos e Cartilagens Articulares Dos Membros 81
 
1.9%
C38 - Coração, Mediastino e Pleura, 20
 
0.5%
C48 - Tecidos Moles do Retroperitônio e do Peritônio 14
 
0.3%
C71 - Encefalo 8
 
0.2%
C49 - Tecido Conjuntivo e de Outros Tecidos Moles 6
 
0.1%
C44 - Pele nao-melanoma 3
 
0.1%
Other values (4) 5
 
0.1%
(Missing) 3600
84.3%

Length

2023-02-28T14:20:28.358913image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
672
 
13.2%
e 657
 
12.9%
das 258
 
5.1%
ossos 232
 
4.6%
cartilagens 232
 
4.6%
articulares 232
 
4.6%
c34 196
 
3.9%
bronquios 196
 
3.9%
pulmoes 196
 
3.9%
dos 162
 
3.2%
Other values (46) 2044
40.3%

Most occurring characters

ValueCountFrequency (%)
4405
14.5%
s 2986
 
9.8%
a 2268
 
7.4%
e 2233
 
7.3%
i 1861
 
6.1%
o 1781
 
5.8%
r 1510
 
5.0%
l 1057
 
3.5%
t 989
 
3.2%
C 931
 
3.1%
Other values (51) 10427
34.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 20318
66.7%
Space Separator 4405
 
14.5%
Uppercase Letter 3558
 
11.7%
Decimal Number 1344
 
4.4%
Dash Punctuation 782
 
2.6%
Other Punctuation 41
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 2986
14.7%
a 2268
11.2%
e 2233
11.0%
i 1861
9.2%
o 1781
8.8%
r 1510
 
7.4%
l 1057
 
5.2%
t 989
 
4.9%
u 889
 
4.4%
n 854
 
4.2%
Other values (21) 3890
19.1%
Uppercase Letter
ValueCountFrequency (%)
C 931
26.2%
D 420
11.8%
O 391
11.0%
B 304
 
8.5%
P 233
 
6.5%
L 232
 
6.5%
A 232
 
6.5%
M 121
 
3.4%
I 107
 
3.0%
V 107
 
3.0%
Other values (7) 480
13.5%
Decimal Number
ValueCountFrequency (%)
4 455
33.9%
3 216
16.1%
2 215
16.0%
7 171
 
12.7%
1 160
 
11.9%
0 81
 
6.0%
8 35
 
2.6%
9 6
 
0.4%
6 3
 
0.2%
5 2
 
0.1%
Space Separator
ValueCountFrequency (%)
4405
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 782
100.0%
Other Punctuation
ValueCountFrequency (%)
, 41
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 23876
78.4%
Common 6572
 
21.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 2986
12.5%
a 2268
 
9.5%
e 2233
 
9.4%
i 1861
 
7.8%
o 1781
 
7.5%
r 1510
 
6.3%
l 1057
 
4.4%
t 989
 
4.1%
C 931
 
3.9%
u 889
 
3.7%
Other values (38) 7371
30.9%
Common
ValueCountFrequency (%)
4405
67.0%
- 782
 
11.9%
4 455
 
6.9%
3 216
 
3.3%
2 215
 
3.3%
7 171
 
2.6%
1 160
 
2.4%
0 81
 
1.2%
, 41
 
0.6%
8 35
 
0.5%
Other values (3) 11
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 29539
97.0%
None 909
 
3.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4405
14.9%
s 2986
 
10.1%
a 2268
 
7.7%
e 2233
 
7.6%
i 1861
 
6.3%
o 1781
 
6.0%
r 1510
 
5.1%
l 1057
 
3.6%
t 989
 
3.3%
C 931
 
3.2%
Other values (43) 9518
32.2%
None
ValueCountFrequency (%)
á 269
29.6%
ç 171
18.8%
õ 151
16.6%
í 107
 
11.8%
ã 101
 
11.1%
â 81
 
8.9%
ô 28
 
3.1%
é 1
 
0.1%
Distinct9
Distinct (%)23.1%
Missing4233
Missing (%)99.1%
Memory size33.5 KiB
C34 - Bronquios e Pulmoes
C22 - Fígado e Das Vias Biliares Intra-hepáticas
C40 - Ossos e Cartilagens Articulares Dos Membros
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos
C71 - Encefalo
Other values (4)

Length

Max length64
Median length52
Mean length41.846154
Min length13

Characters and Unicode

Total characters1632
Distinct characters56
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)5.1%

Sample

1st rowC34 - Bronquios e Pulmoes
2nd rowC22 - Fígado e Das Vias Biliares Intra-hepáticas
3rd rowC71 - Encefalo
4th rowC22 - Fígado e Das Vias Biliares Intra-hepáticas
5th rowC15 - Esofago

Common Values

ValueCountFrequency (%)
C34 - Bronquios e Pulmoes 8
 
0.2%
C22 - Fígado e Das Vias Biliares Intra-hepáticas 8
 
0.2%
C40 - Ossos e Cartilagens Articulares Dos Membros 7
 
0.2%
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos 5
 
0.1%
C71 - Encefalo 4
 
0.1%
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações 3
 
0.1%
C48 - Tecidos Moles do Retroperitônio e do Peritônio 2
 
< 0.1%
C15 - Esofago 1
 
< 0.1%
C74 - Glândula Supra-renal (Glândula Adrenal) 1
 
< 0.1%
(Missing) 4233
99.1%

Length

2023-02-28T14:20:28.542166image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:28.790768image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
39
 
14.2%
e 33
 
12.0%
dos 12
 
4.4%
das 11
 
4.0%
articulares 10
 
3.6%
cartilagens 10
 
3.6%
ossos 10
 
3.6%
c34 8
 
2.9%
intra-hepáticas 8
 
2.9%
biliares 8
 
2.9%
Other values (31) 125
45.6%

Most occurring characters

ValueCountFrequency (%)
235
14.4%
s 146
 
8.9%
e 116
 
7.1%
a 115
 
7.0%
i 103
 
6.3%
o 99
 
6.1%
r 78
 
4.8%
l 56
 
3.4%
n 53
 
3.2%
t 50
 
3.1%
Other values (46) 581
35.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1074
65.8%
Space Separator 235
 
14.4%
Uppercase Letter 195
 
11.9%
Decimal Number 78
 
4.8%
Dash Punctuation 48
 
2.9%
Open Punctuation 1
 
0.1%
Close Punctuation 1
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 146
13.6%
e 116
10.8%
a 115
10.7%
i 103
9.6%
o 99
9.2%
r 78
 
7.3%
l 56
 
5.2%
n 53
 
4.9%
t 50
 
4.7%
c 47
 
4.4%
Other values (17) 211
19.6%
Uppercase Letter
ValueCountFrequency (%)
C 49
25.1%
D 23
11.8%
B 16
 
8.2%
O 13
 
6.7%
A 11
 
5.6%
E 10
 
5.1%
P 10
 
5.1%
M 9
 
4.6%
F 8
 
4.1%
L 8
 
4.1%
Other values (7) 38
19.5%
Decimal Number
ValueCountFrequency (%)
4 21
26.9%
2 16
20.5%
7 15
19.2%
1 8
 
10.3%
3 8
 
10.3%
0 7
 
9.0%
8 2
 
2.6%
5 1
 
1.3%
Space Separator
ValueCountFrequency (%)
235
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 48
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1269
77.8%
Common 363
 
22.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 146
 
11.5%
e 116
 
9.1%
a 115
 
9.1%
i 103
 
8.1%
o 99
 
7.8%
r 78
 
6.1%
l 56
 
4.4%
n 53
 
4.2%
t 50
 
3.9%
C 49
 
3.9%
Other values (34) 404
31.8%
Common
ValueCountFrequency (%)
235
64.7%
- 48
 
13.2%
4 21
 
5.8%
2 16
 
4.4%
7 15
 
4.1%
1 8
 
2.2%
3 8
 
2.2%
0 7
 
1.9%
8 2
 
0.6%
5 1
 
0.3%
Other values (2) 2
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1584
97.1%
None 48
 
2.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
235
14.8%
s 146
 
9.2%
e 116
 
7.3%
a 115
 
7.3%
i 103
 
6.5%
o 99
 
6.2%
r 78
 
4.9%
l 56
 
3.5%
n 53
 
3.3%
t 50
 
3.2%
Other values (39) 533
33.6%
None
ValueCountFrequency (%)
á 18
37.5%
í 8
16.7%
â 7
 
14.6%
ã 5
 
10.4%
ô 4
 
8.3%
ç 3
 
6.2%
õ 3
 
6.2%
Distinct17
Distinct (%)4.5%
Missing3898
Missing (%)91.2%
Memory size33.5 KiB
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações
106 
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos
68 
C22 - Fígado e Das Vias Biliares Intra-hepáticas
53 
C34 - Bronquios e Pulmoes
52 
C40 - Ossos e Cartilagens Articulares Dos Membros
35 
Other values (12)
60 

Length

Max length64
Median length59
Mean length49.264706
Min length10

Characters and Unicode

Total characters18425
Distinct characters62
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)1.1%

Sample

1st rowC22 - Fígado e Das Vias Biliares Intra-hepáticas
2nd rowC34 - Bronquios e Pulmoes
3rd rowC22 - Fígado e Das Vias Biliares Intra-hepáticas
4th rowC40 - Ossos e Cartilagens Articulares Dos Membros
5th rowC71 - Encefalo

Common Values

ValueCountFrequency (%)
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações 106
 
2.5%
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos 68
 
1.6%
C22 - Fígado e Das Vias Biliares Intra-hepáticas 53
 
1.2%
C34 - Bronquios e Pulmoes 52
 
1.2%
C40 - Ossos e Cartilagens Articulares Dos Membros 35
 
0.8%
C38 - Coração, Mediastino e Pleura, 23
 
0.5%
C48 - Tecidos Moles do Retroperitônio e do Peritônio 11
 
0.3%
C71 - Encefalo 8
 
0.2%
C74 - Glândula Supra-renal (Glândula Adrenal) 4
 
0.1%
C49 - Tecido Conjuntivo e de Outros Tecidos Moles 3
 
0.1%
Other values (7) 11
 
0.3%
(Missing) 3898
91.2%

Length

2023-02-28T14:20:29.035437image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
374
 
12.5%
e 353
 
11.8%
das 159
 
5.3%
ossos 141
 
4.7%
cartilagens 141
 
4.7%
articulares 141
 
4.7%
de 111
 
3.7%
outras 108
 
3.6%
c41 106
 
3.5%
localizações 106
 
3.5%
Other values (56) 1255
41.9%

Most occurring characters

ValueCountFrequency (%)
2621
14.2%
s 1750
 
9.5%
a 1498
 
8.1%
e 1316
 
7.1%
i 1177
 
6.4%
o 976
 
5.3%
r 896
 
4.9%
l 640
 
3.5%
t 631
 
3.4%
c 603
 
3.3%
Other values (52) 6317
34.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 12436
67.5%
Space Separator 2621
 
14.2%
Uppercase Letter 2134
 
11.6%
Decimal Number 748
 
4.1%
Dash Punctuation 432
 
2.3%
Other Punctuation 46
 
0.2%
Open Punctuation 4
 
< 0.1%
Close Punctuation 4
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 1750
14.1%
a 1498
12.0%
e 1316
10.6%
i 1177
9.5%
o 976
7.8%
r 896
 
7.2%
l 640
 
5.1%
t 631
 
5.1%
c 603
 
4.8%
n 540
 
4.3%
Other values (20) 2409
19.4%
Uppercase Letter
ValueCountFrequency (%)
C 541
25.4%
D 262
12.3%
O 253
11.9%
L 174
 
8.2%
A 145
 
6.8%
B 105
 
4.9%
P 90
 
4.2%
E 80
 
3.7%
G 79
 
3.7%
M 76
 
3.6%
Other values (7) 329
15.4%
Decimal Number
ValueCountFrequency (%)
4 213
28.5%
7 152
20.3%
1 114
15.2%
2 108
14.4%
3 75
 
10.0%
0 40
 
5.3%
8 34
 
4.5%
5 8
 
1.1%
9 3
 
0.4%
6 1
 
0.1%
Space Separator
ValueCountFrequency (%)
2621
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 432
100.0%
Other Punctuation
ValueCountFrequency (%)
, 46
100.0%
Open Punctuation
ValueCountFrequency (%)
( 4
100.0%
Close Punctuation
ValueCountFrequency (%)
) 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 14570
79.1%
Common 3855
 
20.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 1750
 
12.0%
a 1498
 
10.3%
e 1316
 
9.0%
i 1177
 
8.1%
o 976
 
6.7%
r 896
 
6.1%
l 640
 
4.4%
t 631
 
4.3%
c 603
 
4.1%
C 541
 
3.7%
Other values (37) 4542
31.2%
Common
ValueCountFrequency (%)
2621
68.0%
- 432
 
11.2%
4 213
 
5.5%
7 152
 
3.9%
1 114
 
3.0%
2 108
 
2.8%
3 75
 
1.9%
, 46
 
1.2%
0 40
 
1.0%
8 34
 
0.9%
Other values (5) 20
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 17755
96.4%
None 670
 
3.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2621
14.8%
s 1750
 
9.9%
a 1498
 
8.4%
e 1316
 
7.4%
i 1177
 
6.6%
o 976
 
5.5%
r 896
 
5.0%
l 640
 
3.6%
t 631
 
3.6%
c 603
 
3.4%
Other values (44) 5647
31.8%
None
ValueCountFrequency (%)
á 189
28.2%
ç 129
19.3%
õ 106
15.8%
ã 91
13.6%
â 78
11.6%
í 53
 
7.9%
ô 22
 
3.3%
ó 2
 
0.3%
Distinct6
Distinct (%)40.0%
Missing4257
Missing (%)99.6%
Memory size33.5 KiB
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações
C34 - Bronquios e Pulmoes
C74 - Glândula Supra-renal (Glândula Adrenal)
C22 - Fígado e Das Vias Biliares Intra-hepáticas

Length

Max length64
Median length59
Mean length52
Min length25

Characters and Unicode

Total characters780
Distinct characters53
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)6.7%

Sample

1st rowC77 - Secundária e Não Especificada Dos Gânglios Linfáticos
2nd rowC34 - Bronquios e Pulmoes
3rd rowC34 - Bronquios e Pulmoes
4th rowC74 - Glândula Supra-renal (Glândula Adrenal)
5th rowC77 - Secundária e Não Especificada Dos Gânglios Linfáticos

Common Values

ValueCountFrequency (%)
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos 4
 
0.1%
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações 4
 
0.1%
C34 - Bronquios e Pulmoes 2
 
< 0.1%
C74 - Glândula Supra-renal (Glândula Adrenal) 2
 
< 0.1%
C22 - Fígado e Das Vias Biliares Intra-hepáticas 2
 
< 0.1%
C48 - Tecidos Moles do Retroperitônio e do Peritônio 1
 
< 0.1%
(Missing) 4257
99.6%

Length

2023-02-28T14:20:29.227656image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:29.446649image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
15
 
12.2%
e 13
 
10.6%
das 6
 
4.9%
c77 4
 
3.3%
ossos 4
 
3.3%
glândula 4
 
3.3%
localizações 4
 
3.3%
outras 4
 
3.3%
de 4
 
3.3%
articulares 4
 
3.3%
Other values (25) 61
49.6%

Most occurring characters

ValueCountFrequency (%)
108
 
13.8%
a 62
 
7.9%
s 62
 
7.9%
e 52
 
6.7%
i 51
 
6.5%
o 39
 
5.0%
r 35
 
4.5%
l 33
 
4.2%
n 30
 
3.8%
c 27
 
3.5%
Other values (43) 281
36.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 530
67.9%
Space Separator 108
 
13.8%
Uppercase Letter 89
 
11.4%
Decimal Number 30
 
3.8%
Dash Punctuation 19
 
2.4%
Open Punctuation 2
 
0.3%
Close Punctuation 2
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 62
11.7%
s 62
11.7%
e 52
9.8%
i 51
9.6%
o 39
 
7.4%
r 35
 
6.6%
l 33
 
6.2%
n 30
 
5.7%
c 27
 
5.1%
d 23
 
4.3%
Other values (16) 116
21.9%
Uppercase Letter
ValueCountFrequency (%)
C 19
21.3%
D 10
11.2%
O 8
9.0%
L 8
9.0%
G 8
9.0%
S 6
 
6.7%
A 6
 
6.7%
E 4
 
4.5%
N 4
 
4.5%
B 4
 
4.5%
Other values (7) 12
13.5%
Decimal Number
ValueCountFrequency (%)
7 10
33.3%
4 9
30.0%
1 4
 
13.3%
2 4
 
13.3%
3 2
 
6.7%
8 1
 
3.3%
Space Separator
ValueCountFrequency (%)
108
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 19
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 619
79.4%
Common 161
 
20.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 62
 
10.0%
s 62
 
10.0%
e 52
 
8.4%
i 51
 
8.2%
o 39
 
6.3%
r 35
 
5.7%
l 33
 
5.3%
n 30
 
4.8%
c 27
 
4.4%
d 23
 
3.7%
Other values (33) 205
33.1%
Common
ValueCountFrequency (%)
108
67.1%
- 19
 
11.8%
7 10
 
6.2%
4 9
 
5.6%
1 4
 
2.5%
2 4
 
2.5%
( 2
 
1.2%
) 2
 
1.2%
3 2
 
1.2%
8 1
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 746
95.6%
None 34
 
4.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
108
14.5%
a 62
 
8.3%
s 62
 
8.3%
e 52
 
7.0%
i 51
 
6.8%
o 39
 
5.2%
r 35
 
4.7%
l 33
 
4.4%
n 30
 
4.0%
c 27
 
3.6%
Other values (36) 247
33.1%
None
ValueCountFrequency (%)
á 10
29.4%
â 8
23.5%
ã 4
 
11.8%
ç 4
 
11.8%
õ 4
 
11.8%
í 2
 
5.9%
ô 2
 
5.9%
Distinct14
Distinct (%)8.0%
Missing4097
Missing (%)95.9%
Memory size33.5 KiB
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos
42 
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações
39 
C22 - Fígado e Das Vias Biliares Intra-hepáticas
22 
C34 - Bronquios e Pulmoes
21 
C40 - Ossos e Cartilagens Articulares Dos Membros
15 
Other values (9)
36 

Length

Max length64
Median length52
Mean length48.914286
Min length14

Characters and Unicode

Total characters8560
Distinct characters63
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)1.7%

Sample

1st rowC77 - Secundária e Não Especificada Dos Gânglios Linfáticos
2nd rowC64 - Rim, Exceto Pelve Renal
3rd rowC22 - Fígado e Das Vias Biliares Intra-hepáticas
4th rowC71 - Encefalo
5th rowC22 - Fígado e Das Vias Biliares Intra-hepáticas

Common Values

ValueCountFrequency (%)
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos 42
 
1.0%
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações 39
 
0.9%
C22 - Fígado e Das Vias Biliares Intra-hepáticas 22
 
0.5%
C34 - Bronquios e Pulmoes 21
 
0.5%
C40 - Ossos e Cartilagens Articulares Dos Membros 15
 
0.4%
C38 - Coração, Mediastino e Pleura, 12
 
0.3%
C71 - Encefalo 7
 
0.2%
C48 - Tecidos Moles do Retroperitônio e do Peritônio 7
 
0.2%
C74 - Glândula Supra-renal (Glândula Adrenal) 3
 
0.1%
C42 - Sistema hematopoiético e reticuloendotelial 2
 
< 0.1%
Other values (4) 5
 
0.1%
(Missing) 4097
95.9%

Length

2023-02-28T14:20:29.647014image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
175
 
12.6%
e 162
 
11.7%
das 61
 
4.4%
dos 57
 
4.1%
articulares 54
 
3.9%
cartilagens 54
 
3.9%
ossos 54
 
3.9%
linfáticos 42
 
3.0%
c77 42
 
3.0%
especificada 42
 
3.0%
Other values (50) 642
46.4%

Most occurring characters

ValueCountFrequency (%)
1210
14.1%
s 751
 
8.8%
a 644
 
7.5%
e 607
 
7.1%
i 584
 
6.8%
o 493
 
5.8%
r 382
 
4.5%
c 307
 
3.6%
l 286
 
3.3%
t 282
 
3.3%
Other values (53) 3014
35.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5780
67.5%
Space Separator 1210
 
14.1%
Uppercase Letter 989
 
11.6%
Decimal Number 350
 
4.1%
Dash Punctuation 200
 
2.3%
Other Punctuation 25
 
0.3%
Open Punctuation 3
 
< 0.1%
Close Punctuation 3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 751
13.0%
a 644
11.1%
e 607
10.5%
i 584
10.1%
o 493
8.5%
r 382
 
6.6%
c 307
 
5.3%
l 286
 
4.9%
t 282
 
4.9%
n 276
 
4.8%
Other values (21) 1168
20.2%
Uppercase Letter
ValueCountFrequency (%)
C 243
24.6%
D 118
11.9%
O 95
 
9.6%
L 81
 
8.2%
A 57
 
5.8%
E 51
 
5.2%
G 48
 
4.9%
S 47
 
4.8%
B 43
 
4.3%
P 42
 
4.2%
Other values (7) 164
16.6%
Decimal Number
ValueCountFrequency (%)
7 94
26.9%
4 90
25.7%
2 47
13.4%
1 47
13.4%
3 33
 
9.4%
8 19
 
5.4%
0 15
 
4.3%
9 2
 
0.6%
6 2
 
0.6%
5 1
 
0.3%
Space Separator
ValueCountFrequency (%)
1210
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 200
100.0%
Other Punctuation
ValueCountFrequency (%)
, 25
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6769
79.1%
Common 1791
 
20.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 751
 
11.1%
a 644
 
9.5%
e 607
 
9.0%
i 584
 
8.6%
o 493
 
7.3%
r 382
 
5.6%
c 307
 
4.5%
l 286
 
4.2%
t 282
 
4.2%
n 276
 
4.1%
Other values (38) 2157
31.9%
Common
ValueCountFrequency (%)
1210
67.6%
- 200
 
11.2%
7 94
 
5.2%
4 90
 
5.0%
2 47
 
2.6%
1 47
 
2.6%
3 33
 
1.8%
, 25
 
1.4%
8 19
 
1.1%
0 15
 
0.8%
Other values (5) 11
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8224
96.1%
None 336
 
3.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1210
14.7%
s 751
 
9.1%
a 644
 
7.8%
e 607
 
7.4%
i 584
 
7.1%
o 493
 
6.0%
r 382
 
4.6%
c 307
 
3.7%
l 286
 
3.5%
t 282
 
3.4%
Other values (45) 2678
32.6%
None
ValueCountFrequency (%)
á 106
31.5%
ã 54
16.1%
ç 51
15.2%
â 48
14.3%
õ 39
 
11.6%
í 22
 
6.5%
ô 14
 
4.2%
é 2
 
0.6%
Distinct4
Distinct (%)66.7%
Missing4266
Missing (%)99.9%
Memory size33.5 KiB
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações
C49 - Tecido Conjuntivo e de Outros Tecidos Moles
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos
C34 - Bronquios e Pulmoes

Length

Max length64
Median length61.5
Mean length54.166667
Min length25

Characters and Unicode

Total characters325
Distinct characters45
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)50.0%

Sample

1st rowC49 - Tecido Conjuntivo e de Outros Tecidos Moles
2nd rowC41 - Ossos e Das Cartilagens Articulares de Outras Localizações
3rd rowC41 - Ossos e Das Cartilagens Articulares de Outras Localizações
4th rowC77 - Secundária e Não Especificada Dos Gânglios Linfáticos
5th rowC34 - Bronquios e Pulmoes

Common Values

ValueCountFrequency (%)
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações 3
 
0.1%
C49 - Tecido Conjuntivo e de Outros Tecidos Moles 1
 
< 0.1%
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos 1
 
< 0.1%
C34 - Bronquios e Pulmoes 1
 
< 0.1%
(Missing) 4266
99.9%

Length

2023-02-28T14:20:29.839981image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:30.061978image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
6
 
11.3%
e 6
 
11.3%
de 4
 
7.5%
c41 3
 
5.7%
articulares 3
 
5.7%
localizações 3
 
5.7%
outras 3
 
5.7%
cartilagens 3
 
5.7%
das 3
 
5.7%
ossos 3
 
5.7%
Other values (16) 16
30.2%

Most occurring characters

ValueCountFrequency (%)
47
14.5%
s 33
 
10.2%
e 25
 
7.7%
a 24
 
7.4%
o 19
 
5.8%
i 19
 
5.8%
r 15
 
4.6%
t 12
 
3.7%
c 12
 
3.7%
l 12
 
3.7%
Other values (35) 107
32.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 223
68.6%
Space Separator 47
 
14.5%
Uppercase Letter 37
 
11.4%
Decimal Number 12
 
3.7%
Dash Punctuation 6
 
1.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 33
14.8%
e 25
11.2%
a 24
10.8%
o 19
8.5%
i 19
8.5%
r 15
 
6.7%
t 12
 
5.4%
c 12
 
5.4%
l 12
 
5.4%
u 11
 
4.9%
Other values (15) 41
18.4%
Uppercase Letter
ValueCountFrequency (%)
C 10
27.0%
O 7
18.9%
D 4
 
10.8%
L 4
 
10.8%
A 3
 
8.1%
T 2
 
5.4%
P 1
 
2.7%
B 1
 
2.7%
G 1
 
2.7%
E 1
 
2.7%
Other values (3) 3
 
8.1%
Decimal Number
ValueCountFrequency (%)
4 5
41.7%
1 3
25.0%
7 2
 
16.7%
3 1
 
8.3%
9 1
 
8.3%
Space Separator
ValueCountFrequency (%)
47
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 260
80.0%
Common 65
 
20.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 33
12.7%
e 25
 
9.6%
a 24
 
9.2%
o 19
 
7.3%
i 19
 
7.3%
r 15
 
5.8%
t 12
 
4.6%
c 12
 
4.6%
l 12
 
4.6%
u 11
 
4.2%
Other values (28) 78
30.0%
Common
ValueCountFrequency (%)
47
72.3%
- 6
 
9.2%
4 5
 
7.7%
1 3
 
4.6%
7 2
 
3.1%
3 1
 
1.5%
9 1
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 315
96.9%
None 10
 
3.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
47
14.9%
s 33
 
10.5%
e 25
 
7.9%
a 24
 
7.6%
o 19
 
6.0%
i 19
 
6.0%
r 15
 
4.8%
t 12
 
3.8%
c 12
 
3.8%
l 12
 
3.8%
Other values (30) 97
30.8%
None
ValueCountFrequency (%)
ç 3
30.0%
õ 3
30.0%
á 2
20.0%
â 1
 
10.0%
ã 1
 
10.0%
Distinct13
Distinct (%)19.4%
Missing4205
Missing (%)98.4%
Memory size33.5 KiB
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações
21 
C22 - Fígado e Das Vias Biliares Intra-hepáticas
11 
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos
C34 - Bronquios e Pulmoes
C40 - Ossos e Cartilagens Articulares Dos Membros
Other values (8)
15 

Length

Max length64
Median length59
Mean length50.253731
Min length12

Characters and Unicode

Total characters3367
Distinct characters63
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)6.0%

Sample

1st rowC38 - Coração, Mediastino e Pleura,
2nd rowC22 - Fígado e Das Vias Biliares Intra-hepáticas
3rd rowC77 - Secundária e Não Especificada Dos Gânglios Linfáticos
4th rowC22 - Fígado e Das Vias Biliares Intra-hepáticas
5th rowC48 - Tecidos Moles do Retroperitônio e do Peritônio

Common Values

ValueCountFrequency (%)
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações 21
 
0.5%
C22 - Fígado e Das Vias Biliares Intra-hepáticas 11
 
0.3%
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos 9
 
0.2%
C34 - Bronquios e Pulmoes 6
 
0.1%
C40 - Ossos e Cartilagens Articulares Dos Membros 5
 
0.1%
C74 - Glândula Supra-renal (Glândula Adrenal) 4
 
0.1%
C48 - Tecidos Moles do Retroperitônio e do Peritônio 3
 
0.1%
C38 - Coração, Mediastino e Pleura, 2
 
< 0.1%
C71 - Encefalo 2
 
< 0.1%
C75 - Outras Glândulas Endócrinas e de Estruturas Relacionadas 1
 
< 0.1%
Other values (3) 3
 
0.1%
(Missing) 4205
98.4%

Length

2023-02-28T14:20:30.322674image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
67
 
12.4%
e 59
 
10.9%
das 32
 
5.9%
ossos 26
 
4.8%
cartilagens 26
 
4.8%
articulares 26
 
4.8%
de 23
 
4.2%
outras 22
 
4.1%
c41 21
 
3.9%
localizações 21
 
3.9%
Other values (46) 219
40.4%

Most occurring characters

ValueCountFrequency (%)
475
14.1%
s 312
 
9.3%
a 292
 
8.7%
e 240
 
7.1%
i 203
 
6.0%
r 172
 
5.1%
o 159
 
4.7%
l 134
 
4.0%
t 120
 
3.6%
c 103
 
3.1%
Other values (53) 1157
34.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2277
67.6%
Space Separator 475
 
14.1%
Uppercase Letter 387
 
11.5%
Decimal Number 134
 
4.0%
Dash Punctuation 82
 
2.4%
Close Punctuation 4
 
0.1%
Open Punctuation 4
 
0.1%
Other Punctuation 4
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 312
13.7%
a 292
12.8%
e 240
10.5%
i 203
8.9%
r 172
 
7.6%
o 159
 
7.0%
l 134
 
5.9%
t 120
 
5.3%
c 103
 
4.5%
n 102
 
4.5%
Other values (21) 440
19.3%
Uppercase Letter
ValueCountFrequency (%)
C 96
24.8%
O 50
12.9%
D 46
11.9%
A 30
 
7.8%
L 30
 
7.8%
B 18
 
4.7%
G 18
 
4.7%
E 13
 
3.4%
S 13
 
3.4%
P 11
 
2.8%
Other values (7) 62
16.0%
Decimal Number
ValueCountFrequency (%)
4 40
29.9%
7 26
19.4%
1 23
17.2%
2 22
16.4%
3 8
 
6.0%
8 5
 
3.7%
0 5
 
3.7%
5 2
 
1.5%
6 2
 
1.5%
9 1
 
0.7%
Space Separator
ValueCountFrequency (%)
475
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 82
100.0%
Close Punctuation
ValueCountFrequency (%)
) 4
100.0%
Open Punctuation
ValueCountFrequency (%)
( 4
100.0%
Other Punctuation
ValueCountFrequency (%)
, 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2664
79.1%
Common 703
 
20.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 312
 
11.7%
a 292
 
11.0%
e 240
 
9.0%
i 203
 
7.6%
r 172
 
6.5%
o 159
 
6.0%
l 134
 
5.0%
t 120
 
4.5%
c 103
 
3.9%
n 102
 
3.8%
Other values (38) 827
31.0%
Common
ValueCountFrequency (%)
475
67.6%
- 82
 
11.7%
4 40
 
5.7%
7 26
 
3.7%
1 23
 
3.3%
2 22
 
3.1%
3 8
 
1.1%
8 5
 
0.7%
0 5
 
0.7%
) 4
 
0.6%
Other values (5) 13
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3247
96.4%
None 120
 
3.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
475
14.6%
s 312
 
9.6%
a 292
 
9.0%
e 240
 
7.4%
i 203
 
6.3%
r 172
 
5.3%
o 159
 
4.9%
l 134
 
4.1%
t 120
 
3.7%
c 103
 
3.2%
Other values (45) 1037
31.9%
None
ValueCountFrequency (%)
á 29
24.2%
ç 23
19.2%
õ 21
17.5%
â 18
15.0%
ã 11
 
9.2%
í 11
 
9.2%
ô 6
 
5.0%
ó 1
 
0.8%

metastase_ao_diagnostico_cid_o_4_2
Categorical

MISSING  UNIFORM 

Distinct2
Distinct (%)100.0%
Missing4270
Missing (%)> 99.9%
Memory size33.5 KiB
C22 - Fígado e Das Vias Biliares Intra-hepáticas
C48 - Tecidos Moles do Retroperitônio e do Peritônio

Length

Max length52
Median length50
Mean length50
Min length48

Characters and Unicode

Total characters100
Distinct characters32
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)100.0%

Sample

1st rowC22 - Fígado e Das Vias Biliares Intra-hepáticas
2nd rowC48 - Tecidos Moles do Retroperitônio e do Peritônio

Common Values

ValueCountFrequency (%)
C22 - Fígado e Das Vias Biliares Intra-hepáticas 1
 
< 0.1%
C48 - Tecidos Moles do Retroperitônio e do Peritônio 1
 
< 0.1%
(Missing) 4270
> 99.9%

Length

2023-02-28T14:20:30.652818image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:30.916296image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
2
11.8%
e 2
11.8%
do 2
11.8%
c22 1
 
5.9%
fígado 1
 
5.9%
das 1
 
5.9%
vias 1
 
5.9%
biliares 1
 
5.9%
intra-hepáticas 1
 
5.9%
c48 1
 
5.9%
Other values (4) 4
23.5%

Most occurring characters

ValueCountFrequency (%)
15
15.0%
e 9
 
9.0%
i 9
 
9.0%
o 8
 
8.0%
a 6
 
6.0%
s 6
 
6.0%
t 5
 
5.0%
r 5
 
5.0%
d 4
 
4.0%
- 3
 
3.0%
Other values (22) 30
30.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 67
67.0%
Space Separator 15
 
15.0%
Uppercase Letter 11
 
11.0%
Decimal Number 4
 
4.0%
Dash Punctuation 3
 
3.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 9
13.4%
i 9
13.4%
o 8
11.9%
a 6
9.0%
s 6
9.0%
t 5
7.5%
r 5
7.5%
d 4
 
6.0%
n 3
 
4.5%
ô 2
 
3.0%
Other values (7) 10
14.9%
Uppercase Letter
ValueCountFrequency (%)
C 2
18.2%
B 1
9.1%
I 1
9.1%
V 1
9.1%
D 1
9.1%
T 1
9.1%
M 1
9.1%
R 1
9.1%
F 1
9.1%
P 1
9.1%
Decimal Number
ValueCountFrequency (%)
2 2
50.0%
4 1
25.0%
8 1
25.0%
Space Separator
ValueCountFrequency (%)
15
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 78
78.0%
Common 22
 
22.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 9
11.5%
i 9
11.5%
o 8
 
10.3%
a 6
 
7.7%
s 6
 
7.7%
t 5
 
6.4%
r 5
 
6.4%
d 4
 
5.1%
n 3
 
3.8%
C 2
 
2.6%
Other values (17) 21
26.9%
Common
ValueCountFrequency (%)
15
68.2%
- 3
 
13.6%
2 2
 
9.1%
4 1
 
4.5%
8 1
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 96
96.0%
None 4
 
4.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
15
15.6%
e 9
 
9.4%
i 9
 
9.4%
o 8
 
8.3%
a 6
 
6.2%
s 6
 
6.2%
t 5
 
5.2%
r 5
 
5.2%
d 4
 
4.2%
- 3
 
3.1%
Other values (19) 26
27.1%
None
ValueCountFrequency (%)
ô 2
50.0%
á 1
25.0%
í 1
25.0%

data_do_tratamento_1
Categorical

HIGH CARDINALITY  UNIFORM 

Distinct2405
Distinct (%)56.7%
Missing28
Missing (%)0.7%
Memory size33.5 KiB
2015-01-08
 
7
2015-05-13
 
7
2017-11-01
 
6
2016-01-04
 
6
2016-07-05
 
6
Other values (2400)
4212 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters42440
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1279 ?
Unique (%)30.1%

Sample

1st row2008-08-15
2nd row2008-05-29
3rd row2008-04-07
4th row2008-09-29
5th row2008-09-16

Common Values

ValueCountFrequency (%)
2015-01-08 7
 
0.2%
2015-05-13 7
 
0.2%
2017-11-01 6
 
0.1%
2016-01-04 6
 
0.1%
2016-07-05 6
 
0.1%
2017-07-29 6
 
0.1%
2015-11-12 6
 
0.1%
2017-03-06 6
 
0.1%
2015-09-13 6
 
0.1%
2017-12-17 6
 
0.1%
Other values (2395) 4182
97.9%
(Missing) 28
 
0.7%

Length

2023-02-28T14:20:31.174619image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2015-01-08 7
 
0.2%
2015-05-13 7
 
0.2%
2015-09-13 6
 
0.1%
2016-10-31 6
 
0.1%
2017-07-20 6
 
0.1%
2017-04-11 6
 
0.1%
2017-12-17 6
 
0.1%
2016-05-31 6
 
0.1%
2017-03-06 6
 
0.1%
2015-11-12 6
 
0.1%
Other values (2395) 4182
98.5%

Most occurring characters

ValueCountFrequency (%)
0 9859
23.2%
- 8488
20.0%
1 8134
19.2%
2 7279
17.2%
6 1458
 
3.4%
7 1407
 
3.3%
3 1396
 
3.3%
5 1373
 
3.2%
8 1080
 
2.5%
4 1079
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 33952
80.0%
Dash Punctuation 8488
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 9859
29.0%
1 8134
24.0%
2 7279
21.4%
6 1458
 
4.3%
7 1407
 
4.1%
3 1396
 
4.1%
5 1373
 
4.0%
8 1080
 
3.2%
4 1079
 
3.2%
9 887
 
2.6%
Dash Punctuation
ValueCountFrequency (%)
- 8488
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 42440
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 9859
23.2%
- 8488
20.0%
1 8134
19.2%
2 7279
17.2%
6 1458
 
3.4%
7 1407
 
3.3%
3 1396
 
3.3%
5 1373
 
3.2%
8 1080
 
2.5%
4 1079
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 42440
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 9859
23.2%
- 8488
20.0%
1 8134
19.2%
2 7279
17.2%
6 1458
 
3.4%
7 1407
 
3.3%
3 1396
 
3.3%
5 1373
 
3.2%
8 1080
 
2.5%
4 1079
 
2.5%

data_do_tratamento_2
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct322
Distinct (%)93.6%
Missing3928
Missing (%)91.9%
Memory size33.5 KiB
2017-10-25
 
3
2016-05-14
 
3
2017-10-23
 
2
2013-12-26
 
2
2012-12-05
 
2
Other values (317)
332 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters3440
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique302 ?
Unique (%)87.8%

Sample

1st row2014-06-17
2nd row2010-03-22
3rd row2016-08-24
4th row2007-12-06
5th row2011-05-10

Common Values

ValueCountFrequency (%)
2017-10-25 3
 
0.1%
2016-05-14 3
 
0.1%
2017-10-23 2
 
< 0.1%
2013-12-26 2
 
< 0.1%
2012-12-05 2
 
< 0.1%
2018-06-03 2
 
< 0.1%
2013-02-23 2
 
< 0.1%
2017-09-01 2
 
< 0.1%
2017-06-10 2
 
< 0.1%
2013-08-15 2
 
< 0.1%
Other values (312) 322
 
7.5%
(Missing) 3928
91.9%

Length

2023-02-28T14:20:31.512753image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2017-10-25 3
 
0.9%
2016-05-14 3
 
0.9%
2017-01-12 2
 
0.6%
2014-08-07 2
 
0.6%
2017-12-09 2
 
0.6%
2014-09-25 2
 
0.6%
2015-06-13 2
 
0.6%
2016-03-21 2
 
0.6%
2015-06-16 2
 
0.6%
2016-08-31 2
 
0.6%
Other values (312) 322
93.6%

Most occurring characters

ValueCountFrequency (%)
0 799
23.2%
- 688
20.0%
1 635
18.5%
2 585
17.0%
7 133
 
3.9%
6 113
 
3.3%
5 106
 
3.1%
3 103
 
3.0%
8 103
 
3.0%
4 88
 
2.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2752
80.0%
Dash Punctuation 688
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 799
29.0%
1 635
23.1%
2 585
21.3%
7 133
 
4.8%
6 113
 
4.1%
5 106
 
3.9%
3 103
 
3.7%
8 103
 
3.7%
4 88
 
3.2%
9 87
 
3.2%
Dash Punctuation
ValueCountFrequency (%)
- 688
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 3440
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 799
23.2%
- 688
20.0%
1 635
18.5%
2 585
17.0%
7 133
 
3.9%
6 113
 
3.3%
5 106
 
3.1%
3 103
 
3.0%
8 103
 
3.0%
4 88
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3440
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 799
23.2%
- 688
20.0%
1 635
18.5%
2 585
17.0%
7 133
 
3.9%
6 113
 
3.3%
5 106
 
3.1%
3 103
 
3.0%
8 103
 
3.0%
4 88
 
2.6%
Distinct10
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size33.5 KiB
Outras combinações
1807 
Cirurgia + Radio + Quimio + Hormonio
892 
Cirurgia + Radio + Quimio
578 
Quimioterapia
335 
Radioterapia + Quimioterapia
333 
Other values (5)
327 

Length

Max length36
Median length28
Mean length23.212781
Min length8

Characters and Unicode

Total characters99165
Distinct characters26
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCirurgia + Radio + Quimio + Hormonio
2nd rowCirurgia + Quimioterapia
3rd rowOutras combinações
4th rowOutras combinações
5th rowCirurgia + Radio + Quimio

Common Values

ValueCountFrequency (%)
Outras combinações 1807
42.3%
Cirurgia + Radio + Quimio + Hormonio 892
20.9%
Cirurgia + Radio + Quimio 578
 
13.5%
Quimioterapia 335
 
7.8%
Radioterapia + Quimioterapia 333
 
7.8%
Cirurgia + Quimioterapia 165
 
3.9%
Cirurgia 51
 
1.2%
Nenhum tratamento 45
 
1.1%
Cirurgia + Radioterapia 43
 
1.0%
Radioterapia 23
 
0.5%

Length

2023-02-28T14:20:31.860920image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:32.268941image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
4373
29.4%
outras 1807
12.2%
combinações 1807
12.2%
cirurgia 1729
 
11.6%
radio 1470
 
9.9%
quimio 1470
 
9.9%
hormonio 892
 
6.0%
quimioterapia 833
 
5.6%
radioterapia 399
 
2.7%
nenhum 45
 
0.3%

Most occurring characters

ValueCountFrequency (%)
i 13864
14.0%
10598
 
10.7%
a 9766
 
9.8%
o 8700
 
8.8%
r 7434
 
7.5%
u 5884
 
5.9%
m 5092
 
5.1%
+ 4373
 
4.4%
s 3614
 
3.6%
t 3174
 
3.2%
Other values (16) 26666
26.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 75549
76.2%
Space Separator 10598
 
10.7%
Uppercase Letter 8645
 
8.7%
Math Symbol 4373
 
4.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 13864
18.4%
a 9766
12.9%
o 8700
11.5%
r 7434
9.8%
u 5884
7.8%
m 5092
 
6.7%
s 3614
 
4.8%
t 3174
 
4.2%
e 3129
 
4.1%
n 2789
 
3.7%
Other values (8) 12103
16.0%
Uppercase Letter
ValueCountFrequency (%)
Q 2303
26.6%
R 1869
21.6%
O 1807
20.9%
C 1729
20.0%
H 892
 
10.3%
N 45
 
0.5%
Space Separator
ValueCountFrequency (%)
10598
100.0%
Math Symbol
ValueCountFrequency (%)
+ 4373
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 84194
84.9%
Common 14971
 
15.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 13864
16.5%
a 9766
11.6%
o 8700
10.3%
r 7434
 
8.8%
u 5884
 
7.0%
m 5092
 
6.0%
s 3614
 
4.3%
t 3174
 
3.8%
e 3129
 
3.7%
n 2789
 
3.3%
Other values (14) 20748
24.6%
Common
ValueCountFrequency (%)
10598
70.8%
+ 4373
29.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 95551
96.4%
None 3614
 
3.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 13864
14.5%
10598
11.1%
a 9766
10.2%
o 8700
 
9.1%
r 7434
 
7.8%
u 5884
 
6.2%
m 5092
 
5.3%
+ 4373
 
4.6%
s 3614
 
3.8%
t 3174
 
3.3%
Other values (14) 23052
24.1%
None
ValueCountFrequency (%)
ç 1807
50.0%
õ 1807
50.0%
Distinct10
Distinct (%)2.7%
Missing3903
Missing (%)91.4%
Memory size33.5 KiB
Outras combinações
113 
Cirurgia
103 
Quimioterapia
40 
Cirurgia + Radio + Quimio + Hormonio
27 
Nenhum tratamento
27 
Other values (5)
59 

Length

Max length36
Median length28
Mean length16.815718
Min length8

Characters and Unicode

Total characters6205
Distinct characters26
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowQuimioterapia
2nd rowCirurgia + Quimioterapia
3rd rowCirurgia + Radio + Quimio + Hormonio
4th rowNenhum tratamento
5th rowCirurgia

Common Values

ValueCountFrequency (%)
Outras combinações 113
 
2.6%
Cirurgia 103
 
2.4%
Quimioterapia 40
 
0.9%
Cirurgia + Radio + Quimio + Hormonio 27
 
0.6%
Nenhum tratamento 27
 
0.6%
Cirurgia + Quimioterapia 19
 
0.4%
Cirurgia + Radio + Quimio 13
 
0.3%
Radioterapia + Quimioterapia 12
 
0.3%
Cirurgia + Radioterapia 9
 
0.2%
Radioterapia 6
 
0.1%
(Missing) 3903
91.4%

Length

2023-02-28T14:20:32.603895image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:32.984472image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
cirurgia 171
21.3%
147
18.3%
outras 113
14.1%
combinações 113
14.1%
quimioterapia 71
8.8%
radio 40
 
5.0%
quimio 40
 
5.0%
hormonio 27
 
3.4%
nenhum 27
 
3.4%
tratamento 27
 
3.4%

Most occurring characters

ValueCountFrequency (%)
i 869
14.0%
a 714
11.5%
r 607
 
9.8%
434
 
7.0%
u 422
 
6.8%
o 399
 
6.4%
m 305
 
4.9%
t 292
 
4.7%
e 265
 
4.3%
s 226
 
3.6%
Other values (16) 1672
26.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5108
82.3%
Uppercase Letter 516
 
8.3%
Space Separator 434
 
7.0%
Math Symbol 147
 
2.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 869
17.0%
a 714
14.0%
r 607
11.9%
u 422
8.3%
o 399
7.8%
m 305
 
6.0%
t 292
 
5.7%
e 265
 
5.2%
s 226
 
4.4%
n 194
 
3.8%
Other values (8) 815
16.0%
Uppercase Letter
ValueCountFrequency (%)
C 171
33.1%
O 113
21.9%
Q 111
21.5%
R 67
 
13.0%
H 27
 
5.2%
N 27
 
5.2%
Space Separator
ValueCountFrequency (%)
434
100.0%
Math Symbol
ValueCountFrequency (%)
+ 147
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5624
90.6%
Common 581
 
9.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 869
15.5%
a 714
12.7%
r 607
10.8%
u 422
 
7.5%
o 399
 
7.1%
m 305
 
5.4%
t 292
 
5.2%
e 265
 
4.7%
s 226
 
4.0%
n 194
 
3.4%
Other values (14) 1331
23.7%
Common
ValueCountFrequency (%)
434
74.7%
+ 147
 
25.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5979
96.4%
None 226
 
3.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 869
14.5%
a 714
11.9%
r 607
10.2%
434
 
7.3%
u 422
 
7.1%
o 399
 
6.7%
m 305
 
5.1%
t 292
 
4.9%
e 265
 
4.4%
s 226
 
3.8%
Other values (14) 1446
24.2%
None
ValueCountFrequency (%)
ç 113
50.0%
õ 113
50.0%

ano_do_diagnostico_1
Real number (ℝ)

Distinct13
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2014.3897
Minimum2008
Maximum2020
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.5 KiB
2023-02-28T14:20:33.288813image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum2008
5-th percentile2010
Q12012
median2015
Q32017
95-th percentile2018
Maximum2020
Range12
Interquartile range (IQR)5

Descriptive statistics

Standard deviation2.6955947
Coefficient of variation (CV)0.0013381694
Kurtosis-0.7535391
Mean2014.3897
Median Absolute Deviation (MAD)2
Skewness-0.1738211
Sum8605473
Variance7.2662306
MonotonicityNot monotonic
2023-02-28T14:20:33.634158image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
2017 646
15.1%
2016 645
15.1%
2015 602
14.1%
2011 481
11.3%
2013 426
10.0%
2012 410
9.6%
2014 314
7.4%
2018 264
6.2%
2010 193
 
4.5%
2020 115
 
2.7%
Other values (3) 176
 
4.1%
ValueCountFrequency (%)
2008 40
 
0.9%
2009 89
 
2.1%
2010 193
 
4.5%
2011 481
11.3%
2012 410
9.6%
2013 426
10.0%
2014 314
7.4%
2015 602
14.1%
2016 645
15.1%
2017 646
15.1%
ValueCountFrequency (%)
2020 115
 
2.7%
2019 47
 
1.1%
2018 264
6.2%
2017 646
15.1%
2016 645
15.1%
2015 602
14.1%
2014 314
7.4%
2013 426
10.0%
2012 410
9.6%
2011 481
11.3%

ano_do_diagnostico_2
Real number (ℝ)

Distinct13
Distinct (%)3.5%
Missing3903
Missing (%)91.4%
Infinite0
Infinite (%)0.0%
Mean2015.8293
Minimum2008
Maximum2020
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.5 KiB
2023-02-28T14:20:33.931884image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum2008
5-th percentile2011
Q12014
median2016
Q32018
95-th percentile2020
Maximum2020
Range12
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.5687524
Coefficient of variation (CV)0.0012742907
Kurtosis-0.021503999
Mean2015.8293
Median Absolute Deviation (MAD)2
Skewness-0.61985561
Sum743841
Variance6.5984889
MonotonicityNot monotonic
2023-02-28T14:20:34.282793image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
2017 70
 
1.6%
2018 56
 
1.3%
2016 54
 
1.3%
2015 36
 
0.8%
2014 35
 
0.8%
2013 31
 
0.7%
2019 27
 
0.6%
2020 20
 
0.5%
2012 14
 
0.3%
2011 14
 
0.3%
Other values (3) 12
 
0.3%
(Missing) 3903
91.4%
ValueCountFrequency (%)
2008 2
 
< 0.1%
2009 5
 
0.1%
2010 5
 
0.1%
2011 14
 
0.3%
2012 14
 
0.3%
2013 31
0.7%
2014 35
0.8%
2015 36
0.8%
2016 54
1.3%
2017 70
1.6%
ValueCountFrequency (%)
2020 20
 
0.5%
2019 27
 
0.6%
2018 56
1.3%
2017 70
1.6%
2016 54
1.3%
2015 36
0.8%
2014 35
0.8%
2013 31
0.7%
2012 14
 
0.3%
2011 14
 
0.3%
Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size33.5 KiB
Esquerda
2180 
Direita
1966 
não se aplica
 
126

Length

Max length13
Median length8
Mean length7.6872659
Min length7

Characters and Unicode

Total characters32840
Distinct characters18
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowEsquerda
2nd rowEsquerda
3rd rowEsquerda
4th rowEsquerda
5th rowDireita

Common Values

ValueCountFrequency (%)
Esquerda 2180
51.0%
Direita 1966
46.0%
não se aplica 126
 
2.9%

Length

2023-02-28T14:20:34.644144image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:34.946310image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
esquerda 2180
48.2%
direita 1966
43.5%
não 126
 
2.8%
se 126
 
2.8%
aplica 126
 
2.8%

Most occurring characters

ValueCountFrequency (%)
a 4398
13.4%
e 4272
13.0%
r 4146
12.6%
i 4058
12.4%
s 2306
7.0%
d 2180
6.6%
E 2180
6.6%
u 2180
6.6%
q 2180
6.6%
D 1966
6.0%
Other values (8) 2974
9.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 28442
86.6%
Uppercase Letter 4146
 
12.6%
Space Separator 252
 
0.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 4398
15.5%
e 4272
15.0%
r 4146
14.6%
i 4058
14.3%
s 2306
8.1%
d 2180
7.7%
u 2180
7.7%
q 2180
7.7%
t 1966
6.9%
n 126
 
0.4%
Other values (5) 630
 
2.2%
Uppercase Letter
ValueCountFrequency (%)
E 2180
52.6%
D 1966
47.4%
Space Separator
ValueCountFrequency (%)
252
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 32588
99.2%
Common 252
 
0.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 4398
13.5%
e 4272
13.1%
r 4146
12.7%
i 4058
12.5%
s 2306
7.1%
d 2180
6.7%
E 2180
6.7%
u 2180
6.7%
q 2180
6.7%
D 1966
6.0%
Other values (7) 2722
8.4%
Common
ValueCountFrequency (%)
252
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 32714
99.6%
None 126
 
0.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 4398
13.4%
e 4272
13.1%
r 4146
12.7%
i 4058
12.4%
s 2306
7.0%
d 2180
6.7%
E 2180
6.7%
u 2180
6.7%
q 2180
6.7%
D 1966
6.0%
Other values (7) 2848
8.7%
None
ValueCountFrequency (%)
ã 126
100.0%
Distinct4
Distinct (%)1.1%
Missing3903
Missing (%)91.4%
Memory size33.5 KiB
não se aplica
142 
Esquerda
125 
Direita
100 
Bilateral
 
2

Length

Max length13
Median length9
Mean length9.6585366
Min length7

Characters and Unicode

Total characters3564
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rownão se aplica
2nd rownão se aplica
3rd rowDireita
4th rownão se aplica
5th rownão se aplica

Common Values

ValueCountFrequency (%)
não se aplica 142
 
3.3%
Esquerda 125
 
2.9%
Direita 100
 
2.3%
Bilateral 2
 
< 0.1%
(Missing) 3903
91.4%

Length

2023-02-28T14:20:35.151286image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:35.437493image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
não 142
21.7%
se 142
21.7%
aplica 142
21.7%
esquerda 125
19.1%
direita 100
15.3%
bilateral 2
 
0.3%

Most occurring characters

ValueCountFrequency (%)
a 513
14.4%
e 369
 
10.4%
i 344
 
9.7%
284
 
8.0%
s 267
 
7.5%
r 227
 
6.4%
l 146
 
4.1%
ã 142
 
4.0%
c 142
 
4.0%
n 142
 
4.0%
Other values (9) 988
27.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3053
85.7%
Space Separator 284
 
8.0%
Uppercase Letter 227
 
6.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 513
16.8%
e 369
12.1%
i 344
11.3%
s 267
8.7%
r 227
 
7.4%
l 146
 
4.8%
ã 142
 
4.7%
c 142
 
4.7%
n 142
 
4.7%
p 142
 
4.7%
Other values (5) 619
20.3%
Uppercase Letter
ValueCountFrequency (%)
E 125
55.1%
D 100
44.1%
B 2
 
0.9%
Space Separator
ValueCountFrequency (%)
284
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3280
92.0%
Common 284
 
8.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 513
15.6%
e 369
11.2%
i 344
10.5%
s 267
 
8.1%
r 227
 
6.9%
l 146
 
4.5%
ã 142
 
4.3%
c 142
 
4.3%
n 142
 
4.3%
p 142
 
4.3%
Other values (8) 846
25.8%
Common
ValueCountFrequency (%)
284
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3422
96.0%
None 142
 
4.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 513
15.0%
e 369
10.8%
i 344
10.1%
284
 
8.3%
s 267
 
7.8%
r 227
 
6.6%
l 146
 
4.3%
c 142
 
4.1%
n 142
 
4.1%
p 142
 
4.1%
Other values (8) 846
24.7%
None
ValueCountFrequency (%)
ã 142
100.0%

data_de_recidiva_1
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct1021
Distinct (%)81.7%
Missing3023
Missing (%)70.8%
Memory size33.5 KiB
2017-07-09
 
4
2016-08-08
 
4
2018-09-03
 
4
2014-11-06
 
4
2018-11-16
 
4
Other values (1016)
1229 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters12490
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique835 ?
Unique (%)66.9%

Sample

1st row2014-07-19
2nd row2010-07-15
3rd row2012-12-19
4th row2016-02-29
5th row2009-08-14

Common Values

ValueCountFrequency (%)
2017-07-09 4
 
0.1%
2016-08-08 4
 
0.1%
2018-09-03 4
 
0.1%
2014-11-06 4
 
0.1%
2018-11-16 4
 
0.1%
2017-11-18 4
 
0.1%
2014-08-21 3
 
0.1%
2017-10-17 3
 
0.1%
2017-11-21 3
 
0.1%
2018-08-24 3
 
0.1%
Other values (1011) 1213
28.4%
(Missing) 3023
70.8%

Length

2023-02-28T14:20:35.708092image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2017-07-09 4
 
0.3%
2018-11-16 4
 
0.3%
2017-11-18 4
 
0.3%
2016-08-08 4
 
0.3%
2014-11-06 4
 
0.3%
2018-09-03 4
 
0.3%
2017-10-26 3
 
0.2%
2017-08-20 3
 
0.2%
2014-10-04 3
 
0.2%
2019-01-30 3
 
0.2%
Other values (1011) 1213
97.1%

Most occurring characters

ValueCountFrequency (%)
0 2813
22.5%
- 2498
20.0%
1 2336
18.7%
2 2077
16.6%
7 473
 
3.8%
8 461
 
3.7%
3 402
 
3.2%
6 389
 
3.1%
5 374
 
3.0%
9 336
 
2.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 9992
80.0%
Dash Punctuation 2498
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 2813
28.2%
1 2336
23.4%
2 2077
20.8%
7 473
 
4.7%
8 461
 
4.6%
3 402
 
4.0%
6 389
 
3.9%
5 374
 
3.7%
9 336
 
3.4%
4 331
 
3.3%
Dash Punctuation
ValueCountFrequency (%)
- 2498
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 12490
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 2813
22.5%
- 2498
20.0%
1 2336
18.7%
2 2077
16.6%
7 473
 
3.8%
8 461
 
3.7%
3 402
 
3.2%
6 389
 
3.1%
5 374
 
3.0%
9 336
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12490
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 2813
22.5%
- 2498
20.0%
1 2336
18.7%
2 2077
16.6%
7 473
 
3.8%
8 461
 
3.7%
3 402
 
3.2%
6 389
 
3.1%
5 374
 
3.0%
9 336
 
2.7%

data_de_recidiva_2
Categorical

MISSING  UNIFORM 

Distinct45
Distinct (%)97.8%
Missing4226
Missing (%)98.9%
Memory size33.5 KiB
2019-02-16
 
2
2017-10-26
 
1
2019-02-03
 
1
2019-02-09
 
1
2018-04-16
 
1
Other values (40)
40 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters460
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique44 ?
Unique (%)95.7%

Sample

1st row2011-08-04
2nd row2012-04-24
3rd row2017-10-11
4th row2017-03-24
5th row2018-11-24

Common Values

ValueCountFrequency (%)
2019-02-16 2
 
< 0.1%
2017-10-26 1
 
< 0.1%
2019-02-03 1
 
< 0.1%
2019-02-09 1
 
< 0.1%
2018-04-16 1
 
< 0.1%
2019-05-06 1
 
< 0.1%
2017-07-27 1
 
< 0.1%
2017-03-14 1
 
< 0.1%
2020-11-05 1
 
< 0.1%
2015-10-24 1
 
< 0.1%
Other values (35) 35
 
0.8%
(Missing) 4226
98.9%

Length

2023-02-28T14:20:35.915493image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2019-02-16 2
 
4.3%
2015-10-23 1
 
2.2%
2012-04-24 1
 
2.2%
2017-10-11 1
 
2.2%
2017-03-24 1
 
2.2%
2018-11-24 1
 
2.2%
2012-09-07 1
 
2.2%
2012-03-22 1
 
2.2%
2015-04-27 1
 
2.2%
2011-04-04 1
 
2.2%
Other values (35) 35
76.1%

Most occurring characters

ValueCountFrequency (%)
0 104
22.6%
- 92
20.0%
1 82
17.8%
2 81
17.6%
7 20
 
4.3%
9 18
 
3.9%
5 16
 
3.5%
4 15
 
3.3%
6 14
 
3.0%
3 10
 
2.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 368
80.0%
Dash Punctuation 92
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 104
28.3%
1 82
22.3%
2 81
22.0%
7 20
 
5.4%
9 18
 
4.9%
5 16
 
4.3%
4 15
 
4.1%
6 14
 
3.8%
3 10
 
2.7%
8 8
 
2.2%
Dash Punctuation
ValueCountFrequency (%)
- 92
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 460
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 104
22.6%
- 92
20.0%
1 82
17.8%
2 81
17.6%
7 20
 
4.3%
9 18
 
3.9%
5 16
 
3.5%
4 15
 
3.3%
6 14
 
3.0%
3 10
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 460
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 104
22.6%
- 92
20.0%
1 82
17.8%
2 81
17.6%
7 20
 
4.3%
9 18
 
3.9%
5 16
 
3.5%
4 15
 
3.3%
6 14
 
3.0%
3 10
 
2.2%
Distinct821
Distinct (%)65.7%
Missing3023
Missing (%)70.8%
Infinite0
Infinite (%)0.0%
Mean633.98559
Minimum0
Maximum3462
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size33.5 KiB
2023-02-28T14:20:36.102979image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile68
Q1256
median489
Q3868
95-th percentile1708.2
Maximum3462
Range3462
Interquartile range (IQR)612

Descriptive statistics

Standard deviation535.467
Coefficient of variation (CV)0.84460438
Kurtosis4.0981657
Mean633.98559
Median Absolute Deviation (MAD)281
Skewness1.7732261
Sum791848
Variance286724.91
MonotonicityNot monotonic
2023-02-28T14:20:36.328651image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
276 6
 
0.1%
188 5
 
0.1%
777 5
 
0.1%
251 5
 
0.1%
309 5
 
0.1%
195 5
 
0.1%
217 4
 
0.1%
719 4
 
0.1%
345 4
 
0.1%
248 4
 
0.1%
Other values (811) 1202
 
28.1%
(Missing) 3023
70.8%
ValueCountFrequency (%)
0 1
 
< 0.1%
1 1
 
< 0.1%
3 1
 
< 0.1%
7 2
< 0.1%
8 1
 
< 0.1%
10 1
 
< 0.1%
12 3
0.1%
18 1
 
< 0.1%
19 1
 
< 0.1%
20 1
 
< 0.1%
ValueCountFrequency (%)
3462 1
< 0.1%
3283 1
< 0.1%
3091 1
< 0.1%
3089 1
< 0.1%
3069 1
< 0.1%
2933 1
< 0.1%
2870 1
< 0.1%
2850 1
< 0.1%
2739 2
< 0.1%
2729 1
< 0.1%
Distinct45
Distinct (%)97.8%
Missing4226
Missing (%)98.9%
Infinite0
Infinite (%)0.0%
Mean614.97826
Minimum0
Maximum2977
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size33.5 KiB
2023-02-28T14:20:36.542574image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile56.25
Q1231.25
median467
Q3861.5
95-th percentile1590.75
Maximum2977
Range2977
Interquartile range (IQR)630.25

Descriptive statistics

Standard deviation560.80032
Coefficient of variation (CV)0.91190267
Kurtosis5.9611467
Mean614.97826
Median Absolute Deviation (MAD)271.5
Skewness2.025504
Sum28289
Variance314497
MonotonicityNot monotonic
2023-02-28T14:20:36.751760image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=45)
ValueCountFrequency (%)
263 2
 
< 0.1%
676 1
 
< 0.1%
244 1
 
< 0.1%
474 1
 
< 0.1%
1054 1
 
< 0.1%
353 1
 
< 0.1%
1560 1
 
< 0.1%
1696 1
 
< 0.1%
415 1
 
< 0.1%
407 1
 
< 0.1%
Other values (35) 35
 
0.8%
(Missing) 4226
98.9%
ValueCountFrequency (%)
0 1
< 0.1%
24 1
< 0.1%
49 1
< 0.1%
78 1
< 0.1%
115 1
< 0.1%
135 1
< 0.1%
155 1
< 0.1%
158 1
< 0.1%
194 1
< 0.1%
199 1
< 0.1%
ValueCountFrequency (%)
2977 1
< 0.1%
1696 1
< 0.1%
1601 1
< 0.1%
1560 1
< 0.1%
1457 1
< 0.1%
1054 1
< 0.1%
1044 1
< 0.1%
996 1
< 0.1%
906 1
< 0.1%
900 1
< 0.1%
Distinct23
Distinct (%)2.3%
Missing3282
Missing (%)76.8%
Memory size33.5 KiB
C34 - Bronquios e Pulmoes
258 
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações
169 
C22 - Fígado e Das Vias Biliares Intra-hepáticas
142 
C71 - Encefalo
131 
C40 - Ossos e Cartilagens Articulares Dos Membros
87 
Other values (18)
203 

Length

Max length100
Median length59
Mean length39.748485
Min length10

Characters and Unicode

Total characters39351
Distinct characters64
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)0.7%

Sample

1st rowC34 - Bronquios e Pulmoes
2nd rowC38 - Coração, Mediastino e Pleura,
3rd rowC71 - Encefalo
4th rowC71 - Encefalo
5th rowC22 - Fígado e Das Vias Biliares Intra-hepáticas

Common Values

ValueCountFrequency (%)
C34 - Bronquios e Pulmoes 258
 
6.0%
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações 169
 
4.0%
C22 - Fígado e Das Vias Biliares Intra-hepáticas 142
 
3.3%
C71 - Encefalo 131
 
3.1%
C40 - Ossos e Cartilagens Articulares Dos Membros 87
 
2.0%
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos 86
 
2.0%
C38 - Coração, Mediastino e Pleura, 36
 
0.8%
C48 - Tecidos Moles do Retroperitônio e do Peritônio 21
 
0.5%
C50 - Mama 15
 
0.4%
C49 - Tecido Conjuntivo e de Outros Tecidos Moles 13
 
0.3%
Other values (13) 32
 
0.7%
(Missing) 3282
76.8%

Length

2023-02-28T14:20:36.955591image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
990
 
14.7%
e 818
 
12.2%
das 311
 
4.6%
pulmoes 258
 
3.8%
c34 258
 
3.8%
bronquios 258
 
3.8%
ossos 256
 
3.8%
cartilagens 256
 
3.8%
articulares 256
 
3.8%
de 186
 
2.8%
Other values (78) 2870
42.7%

Most occurring characters

ValueCountFrequency (%)
5727
14.6%
s 3551
 
9.0%
a 2910
 
7.4%
e 2906
 
7.4%
o 2385
 
6.1%
i 2278
 
5.8%
r 1839
 
4.7%
l 1413
 
3.6%
C 1304
 
3.3%
t 1208
 
3.1%
Other values (54) 13830
35.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 25762
65.5%
Space Separator 5727
 
14.6%
Uppercase Letter 4663
 
11.8%
Decimal Number 1980
 
5.0%
Dash Punctuation 1139
 
2.9%
Other Punctuation 78
 
0.2%
Open Punctuation 1
 
< 0.1%
Close Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 3551
13.8%
a 2910
11.3%
e 2906
11.3%
o 2385
9.3%
i 2278
8.8%
r 1839
 
7.1%
l 1413
 
5.5%
t 1208
 
4.7%
n 1201
 
4.7%
u 1103
 
4.3%
Other values (21) 4968
19.3%
Uppercase Letter
ValueCountFrequency (%)
C 1304
28.0%
D 488
 
10.5%
O 447
 
9.6%
B 401
 
8.6%
P 327
 
7.0%
A 257
 
5.5%
L 255
 
5.5%
E 226
 
4.8%
M 185
 
4.0%
V 143
 
3.1%
Other values (8) 630
13.5%
Decimal Number
ValueCountFrequency (%)
4 564
28.5%
7 313
15.8%
1 306
15.5%
3 295
14.9%
2 292
14.7%
0 108
 
5.5%
8 58
 
2.9%
5 20
 
1.0%
9 13
 
0.7%
6 11
 
0.6%
Space Separator
ValueCountFrequency (%)
5727
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1139
100.0%
Other Punctuation
ValueCountFrequency (%)
, 78
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 30425
77.3%
Common 8926
 
22.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 3551
 
11.7%
a 2910
 
9.6%
e 2906
 
9.6%
o 2385
 
7.8%
i 2278
 
7.5%
r 1839
 
6.0%
l 1413
 
4.6%
C 1304
 
4.3%
t 1208
 
4.0%
n 1201
 
3.9%
Other values (39) 9430
31.0%
Common
ValueCountFrequency (%)
5727
64.2%
- 1139
 
12.8%
4 564
 
6.3%
7 313
 
3.5%
1 306
 
3.4%
3 295
 
3.3%
2 292
 
3.3%
0 108
 
1.2%
, 78
 
0.9%
8 58
 
0.6%
Other values (5) 46
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 38266
97.2%
None 1085
 
2.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5727
15.0%
s 3551
 
9.3%
a 2910
 
7.6%
e 2906
 
7.6%
o 2385
 
6.2%
i 2278
 
6.0%
r 1839
 
4.8%
l 1413
 
3.7%
C 1304
 
3.4%
t 1208
 
3.2%
Other values (46) 12745
33.3%
None
ValueCountFrequency (%)
á 314
28.9%
ç 205
18.9%
õ 169
15.6%
í 143
13.2%
ã 122
 
11.2%
â 88
 
8.1%
ô 42
 
3.9%
é 2
 
0.2%
Distinct11
Distinct (%)26.8%
Missing4231
Missing (%)99.0%
Memory size33.5 KiB
C22 - Fígado e Das Vias Biliares Intra-hepáticas
10 
C34 - Bronquios e Pulmoes
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos
C48 - Tecidos Moles do Retroperitônio e do Peritônio
Other values (6)
10 

Length

Max length64
Median length52
Mean length42
Min length10

Characters and Unicode

Total characters1722
Distinct characters60
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)7.3%

Sample

1st rowC41 - Ossos e Das Cartilagens Articulares de Outras Localizações
2nd rowC77 - Secundária e Não Especificada Dos Gânglios Linfáticos
3rd rowC22 - Fígado e Das Vias Biliares Intra-hepáticas
4th rowC22 - Fígado e Das Vias Biliares Intra-hepáticas
5th rowC34 - Bronquios e Pulmoes

Common Values

ValueCountFrequency (%)
C22 - Fígado e Das Vias Biliares Intra-hepáticas 10
 
0.2%
C34 - Bronquios e Pulmoes 9
 
0.2%
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações 5
 
0.1%
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos 4
 
0.1%
C48 - Tecidos Moles do Retroperitônio e do Peritônio 3
 
0.1%
C71 - Encefalo 3
 
0.1%
C49 - Tecido Conjuntivo e de Outros Tecidos Moles 2
 
< 0.1%
C40 - Ossos e Cartilagens Articulares Dos Membros 2
 
< 0.1%
C74 - Glândula Supra-renal (Glândula Adrenal) 1
 
< 0.1%
C56 - Ovario 1
 
< 0.1%
(Missing) 4231
99.0%

Length

2023-02-28T14:20:37.151621image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
41
 
14.0%
e 35
 
11.9%
das 15
 
5.1%
c22 10
 
3.4%
fígado 10
 
3.4%
vias 10
 
3.4%
biliares 10
 
3.4%
intra-hepáticas 10
 
3.4%
c34 9
 
3.1%
bronquios 9
 
3.1%
Other values (37) 134
45.7%

Most occurring characters

ValueCountFrequency (%)
252
14.6%
s 140
 
8.1%
e 127
 
7.4%
a 122
 
7.1%
i 114
 
6.6%
o 107
 
6.2%
r 76
 
4.4%
t 57
 
3.3%
l 56
 
3.3%
n 55
 
3.2%
Other values (50) 616
35.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1130
65.6%
Space Separator 252
 
14.6%
Uppercase Letter 204
 
11.8%
Decimal Number 82
 
4.8%
Dash Punctuation 52
 
3.0%
Open Punctuation 1
 
0.1%
Close Punctuation 1
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 140
12.4%
e 127
11.2%
a 122
10.8%
i 114
10.1%
o 107
9.5%
r 76
 
6.7%
t 57
 
5.0%
l 56
 
5.0%
n 55
 
4.9%
c 48
 
4.2%
Other values (19) 228
20.2%
Uppercase Letter
ValueCountFrequency (%)
C 50
24.5%
D 21
10.3%
B 19
 
9.3%
O 15
 
7.4%
P 12
 
5.9%
F 10
 
4.9%
I 10
 
4.9%
V 10
 
4.9%
L 9
 
4.4%
A 8
 
3.9%
Other values (7) 40
19.6%
Decimal Number
ValueCountFrequency (%)
4 22
26.8%
2 21
25.6%
7 12
14.6%
3 9
11.0%
1 8
 
9.8%
8 3
 
3.7%
0 3
 
3.7%
9 2
 
2.4%
5 1
 
1.2%
6 1
 
1.2%
Space Separator
ValueCountFrequency (%)
252
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 52
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1334
77.5%
Common 388
 
22.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 140
 
10.5%
e 127
 
9.5%
a 122
 
9.1%
i 114
 
8.5%
o 107
 
8.0%
r 76
 
5.7%
t 57
 
4.3%
l 56
 
4.2%
n 55
 
4.1%
C 50
 
3.7%
Other values (36) 430
32.2%
Common
ValueCountFrequency (%)
252
64.9%
- 52
 
13.4%
4 22
 
5.7%
2 21
 
5.4%
7 12
 
3.1%
3 9
 
2.3%
1 8
 
2.1%
8 3
 
0.8%
0 3
 
0.8%
9 2
 
0.5%
Other values (4) 4
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1668
96.9%
None 54
 
3.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
252
15.1%
s 140
 
8.4%
e 127
 
7.6%
a 122
 
7.3%
i 114
 
6.8%
o 107
 
6.4%
r 76
 
4.6%
t 57
 
3.4%
l 56
 
3.4%
n 55
 
3.3%
Other values (43) 562
33.7%
None
ValueCountFrequency (%)
á 18
33.3%
í 10
18.5%
ô 6
 
11.1%
â 6
 
11.1%
ç 5
 
9.3%
õ 5
 
9.3%
ã 4
 
7.4%
Distinct21
Distinct (%)3.9%
Missing3737
Missing (%)87.5%
Memory size33.5 KiB
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações
125 
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos
94 
C22 - Fígado e Das Vias Biliares Intra-hepáticas
90 
C34 - Bronquios e Pulmoes
68 
C71 - Encefalo
32 
Other values (16)
126 

Length

Max length100
Median length59
Mean length46.403738
Min length10

Characters and Unicode

Total characters24826
Distinct characters63
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)1.3%

Sample

1st rowC50 - Mama
2nd rowC71 - Encefalo
3rd rowC48 - Tecidos Moles do Retroperitônio e do Peritônio
4th rowC41 - Ossos e Das Cartilagens Articulares de Outras Localizações
5th rowC50 - Mama

Common Values

ValueCountFrequency (%)
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações 125
 
2.9%
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos 94
 
2.2%
C22 - Fígado e Das Vias Biliares Intra-hepáticas 90
 
2.1%
C34 - Bronquios e Pulmoes 68
 
1.6%
C71 - Encefalo 32
 
0.7%
C40 - Ossos e Cartilagens Articulares Dos Membros 32
 
0.7%
C38 - Coração, Mediastino e Pleura, 30
 
0.7%
C48 - Tecidos Moles do Retroperitônio e do Peritônio 16
 
0.4%
C50 - Mama 14
 
0.3%
C49 - Tecido Conjuntivo e de Outros Tecidos Moles 12
 
0.3%
Other values (11) 22
 
0.5%
(Missing) 3737
87.5%

Length

2023-02-28T14:20:37.361318image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
535
 
13.1%
e 471
 
11.5%
das 215
 
5.3%
ossos 157
 
3.8%
cartilagens 157
 
3.8%
articulares 157
 
3.8%
de 138
 
3.4%
dos 127
 
3.1%
outras 126
 
3.1%
c41 125
 
3.1%
Other values (75) 1881
46.0%

Most occurring characters

ValueCountFrequency (%)
3554
14.3%
s 2216
 
8.9%
a 1989
 
8.0%
e 1777
 
7.2%
i 1636
 
6.6%
o 1354
 
5.5%
r 1113
 
4.5%
t 833
 
3.4%
c 826
 
3.3%
l 818
 
3.3%
Other values (53) 8710
35.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 16599
66.9%
Space Separator 3554
 
14.3%
Uppercase Letter 2904
 
11.7%
Decimal Number 1070
 
4.3%
Dash Punctuation 630
 
2.5%
Other Punctuation 63
 
0.3%
Open Punctuation 3
 
< 0.1%
Close Punctuation 3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 2216
13.4%
a 1989
12.0%
e 1777
10.7%
i 1636
9.9%
o 1354
8.2%
r 1113
 
6.7%
t 833
 
5.0%
c 826
 
5.0%
l 818
 
4.9%
n 759
 
4.6%
Other values (21) 3278
19.7%
Uppercase Letter
ValueCountFrequency (%)
C 737
25.4%
D 342
11.8%
O 296
10.2%
L 219
 
7.5%
A 161
 
5.5%
B 158
 
5.4%
E 128
 
4.4%
P 120
 
4.1%
M 114
 
3.9%
G 102
 
3.5%
Other values (7) 527
18.1%
Decimal Number
ValueCountFrequency (%)
4 264
24.7%
7 234
21.9%
2 183
17.1%
1 158
14.8%
3 99
 
9.3%
0 56
 
5.2%
8 47
 
4.4%
5 15
 
1.4%
9 12
 
1.1%
6 2
 
0.2%
Space Separator
ValueCountFrequency (%)
3554
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 630
100.0%
Other Punctuation
ValueCountFrequency (%)
, 63
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 19503
78.6%
Common 5323
 
21.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 2216
 
11.4%
a 1989
 
10.2%
e 1777
 
9.1%
i 1636
 
8.4%
o 1354
 
6.9%
r 1113
 
5.7%
t 833
 
4.3%
c 826
 
4.2%
l 818
 
4.2%
n 759
 
3.9%
Other values (38) 6182
31.7%
Common
ValueCountFrequency (%)
3554
66.8%
- 630
 
11.8%
4 264
 
5.0%
7 234
 
4.4%
2 183
 
3.4%
1 158
 
3.0%
3 99
 
1.9%
, 63
 
1.2%
0 56
 
1.1%
8 47
 
0.9%
Other values (5) 35
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 23918
96.3%
None 908
 
3.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3554
14.9%
s 2216
 
9.3%
a 1989
 
8.3%
e 1777
 
7.4%
i 1636
 
6.8%
o 1354
 
5.7%
r 1113
 
4.7%
t 833
 
3.5%
c 826
 
3.5%
l 818
 
3.4%
Other values (45) 7802
32.6%
None
ValueCountFrequency (%)
á 278
30.6%
ç 155
17.1%
õ 125
13.8%
ã 124
13.7%
â 100
 
11.0%
í 90
 
9.9%
ô 33
 
3.6%
é 3
 
0.3%
Distinct10
Distinct (%)52.6%
Missing4253
Missing (%)99.6%
Memory size33.5 KiB
C48 - Tecidos Moles do Retroperitônio e do Peritônio
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos
C22 - Fígado e Das Vias Biliares Intra-hepáticas
C74 - Glândula Supra-renal (Glândula Adrenal)
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações
Other values (5)

Length

Max length64
Median length52
Mean length48.263158
Min length14

Characters and Unicode

Total characters917
Distinct characters61
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)26.3%

Sample

1st rowC48 - Tecidos Moles do Retroperitônio e do Peritônio
2nd rowC77 - Secundária e Não Especificada Dos Gânglios Linfáticos
3rd rowC48 - Tecidos Moles do Retroperitônio e do Peritônio
4th rowC49 - Tecido Conjuntivo e de Outros Tecidos Moles
5th rowC22 - Fígado e Das Vias Biliares Intra-hepáticas

Common Values

ValueCountFrequency (%)
C48 - Tecidos Moles do Retroperitônio e do Peritônio 5
 
0.1%
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos 3
 
0.1%
C22 - Fígado e Das Vias Biliares Intra-hepáticas 2
 
< 0.1%
C74 - Glândula Supra-renal (Glândula Adrenal) 2
 
< 0.1%
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações 2
 
< 0.1%
C49 - Tecido Conjuntivo e de Outros Tecidos Moles 1
 
< 0.1%
C64 - Rim, Exceto Pelve Renal 1
 
< 0.1%
C70 - Meninges 1
 
< 0.1%
C40 - Ossos e Cartilagens Articulares Dos Membros 1
 
< 0.1%
C34 - Bronquios e Pulmoes 1
 
< 0.1%
(Missing) 4253
99.6%

Length

2023-02-28T14:20:37.540184image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:37.772037image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
19
 
12.6%
e 15
 
9.9%
do 10
 
6.6%
tecidos 6
 
4.0%
moles 6
 
4.0%
c48 5
 
3.3%
retroperitônio 5
 
3.3%
peritônio 5
 
3.3%
glândula 4
 
2.6%
das 4
 
2.6%
Other values (38) 72
47.7%

Most occurring characters

ValueCountFrequency (%)
132
14.4%
e 76
 
8.3%
o 66
 
7.2%
i 65
 
7.1%
s 59
 
6.4%
a 49
 
5.3%
r 42
 
4.6%
n 38
 
4.1%
d 34
 
3.7%
l 34
 
3.7%
Other values (51) 322
35.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 615
67.1%
Space Separator 132
 
14.4%
Uppercase Letter 104
 
11.3%
Decimal Number 38
 
4.1%
Dash Punctuation 23
 
2.5%
Close Punctuation 2
 
0.2%
Open Punctuation 2
 
0.2%
Other Punctuation 1
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 76
12.4%
o 66
10.7%
i 65
10.6%
s 59
9.6%
a 49
8.0%
r 42
 
6.8%
n 38
 
6.2%
d 34
 
5.5%
l 34
 
5.5%
t 33
 
5.4%
Other values (20) 119
19.3%
Uppercase Letter
ValueCountFrequency (%)
C 23
22.1%
D 8
 
7.7%
M 8
 
7.7%
R 7
 
6.7%
P 7
 
6.7%
T 7
 
6.7%
G 7
 
6.7%
O 6
 
5.8%
S 5
 
4.8%
A 5
 
4.8%
Other values (7) 21
20.2%
Decimal Number
ValueCountFrequency (%)
4 13
34.2%
7 9
23.7%
8 5
 
13.2%
2 4
 
10.5%
0 2
 
5.3%
1 2
 
5.3%
9 1
 
2.6%
6 1
 
2.6%
3 1
 
2.6%
Space Separator
ValueCountFrequency (%)
132
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 23
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Other Punctuation
ValueCountFrequency (%)
, 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 719
78.4%
Common 198
 
21.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 76
 
10.6%
o 66
 
9.2%
i 65
 
9.0%
s 59
 
8.2%
a 49
 
6.8%
r 42
 
5.8%
n 38
 
5.3%
d 34
 
4.7%
l 34
 
4.7%
t 33
 
4.6%
Other values (37) 223
31.0%
Common
ValueCountFrequency (%)
132
66.7%
- 23
 
11.6%
4 13
 
6.6%
7 9
 
4.5%
8 5
 
2.5%
2 4
 
2.0%
0 2
 
1.0%
1 2
 
1.0%
) 2
 
1.0%
( 2
 
1.0%
Other values (4) 4
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 883
96.3%
None 34
 
3.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
132
14.9%
e 76
 
8.6%
o 66
 
7.5%
i 65
 
7.4%
s 59
 
6.7%
a 49
 
5.5%
r 42
 
4.8%
n 38
 
4.3%
d 34
 
3.9%
l 34
 
3.9%
Other values (44) 288
32.6%
None
ValueCountFrequency (%)
ô 10
29.4%
á 8
23.5%
â 7
20.6%
ã 3
 
8.8%
õ 2
 
5.9%
ç 2
 
5.9%
í 2
 
5.9%
Distinct19
Distinct (%)7.3%
Missing4013
Missing (%)93.9%
Memory size33.5 KiB
C22 - Fígado e Das Vias Biliares Intra-hepáticas
48 
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações
46 
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos
44 
C34 - Bronquios e Pulmoes
36 
C38 - Coração, Mediastino e Pleura,
21 
Other values (14)
64 

Length

Max length100
Median length59
Mean length45.760618
Min length10

Characters and Unicode

Total characters11852
Distinct characters64
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)1.9%

Sample

1st rowC34 - Bronquios e Pulmoes
2nd rowC22 - Fígado e Das Vias Biliares Intra-hepáticas
3rd rowC38 - Coração, Mediastino e Pleura,
4th rowC34 - Bronquios e Pulmoes
5th rowC22 - Fígado e Das Vias Biliares Intra-hepáticas

Common Values

ValueCountFrequency (%)
C22 - Fígado e Das Vias Biliares Intra-hepáticas 48
 
1.1%
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações 46
 
1.1%
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos 44
 
1.0%
C34 - Bronquios e Pulmoes 36
 
0.8%
C38 - Coração, Mediastino e Pleura, 21
 
0.5%
C71 - Encefalo 14
 
0.3%
C40 - Ossos e Cartilagens Articulares Dos Membros 13
 
0.3%
C49 - Tecido Conjuntivo e de Outros Tecidos Moles 9
 
0.2%
C74 - Glândula Supra-renal (Glândula Adrenal) 8
 
0.2%
C48 - Tecidos Moles do Retroperitônio e do Peritônio 6
 
0.1%
Other values (9) 14
 
0.3%
(Missing) 4013
93.9%

Length

2023-02-28T14:20:37.997275image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
259
 
13.2%
e 225
 
11.5%
das 94
 
4.8%
ossos 59
 
3.0%
cartilagens 59
 
3.0%
articulares 59
 
3.0%
dos 59
 
3.0%
de 57
 
2.9%
intra-hepáticas 48
 
2.5%
biliares 48
 
2.5%
Other values (68) 990
50.6%

Most occurring characters

ValueCountFrequency (%)
1698
 
14.3%
s 977
 
8.2%
a 938
 
7.9%
e 843
 
7.1%
i 769
 
6.5%
o 658
 
5.6%
r 522
 
4.4%
l 410
 
3.5%
n 391
 
3.3%
t 373
 
3.1%
Other values (54) 4273
36.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7858
66.3%
Space Separator 1698
 
14.3%
Uppercase Letter 1398
 
11.8%
Decimal Number 518
 
4.4%
Dash Punctuation 317
 
2.7%
Other Punctuation 47
 
0.4%
Open Punctuation 8
 
0.1%
Close Punctuation 8
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 977
12.4%
a 938
11.9%
e 843
10.7%
i 769
9.8%
o 658
8.4%
r 522
 
6.6%
l 410
 
5.2%
n 391
 
5.0%
t 373
 
4.7%
c 371
 
4.7%
Other values (21) 1606
20.4%
Uppercase Letter
ValueCountFrequency (%)
C 352
25.2%
D 153
10.9%
O 117
 
8.4%
L 90
 
6.4%
B 84
 
6.0%
P 71
 
5.1%
A 67
 
4.8%
E 64
 
4.6%
G 61
 
4.4%
M 55
 
3.9%
Other values (8) 284
20.3%
Decimal Number
ValueCountFrequency (%)
4 125
24.1%
7 113
21.8%
2 99
19.1%
1 61
11.8%
3 58
11.2%
8 27
 
5.2%
0 15
 
2.9%
9 9
 
1.7%
6 6
 
1.2%
5 5
 
1.0%
Space Separator
ValueCountFrequency (%)
1698
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 317
100.0%
Other Punctuation
ValueCountFrequency (%)
, 47
100.0%
Open Punctuation
ValueCountFrequency (%)
( 8
100.0%
Close Punctuation
ValueCountFrequency (%)
) 8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9256
78.1%
Common 2596
 
21.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 977
 
10.6%
a 938
 
10.1%
e 843
 
9.1%
i 769
 
8.3%
o 658
 
7.1%
r 522
 
5.6%
l 410
 
4.4%
n 391
 
4.2%
t 373
 
4.0%
c 371
 
4.0%
Other values (39) 3004
32.5%
Common
ValueCountFrequency (%)
1698
65.4%
- 317
 
12.2%
4 125
 
4.8%
7 113
 
4.4%
2 99
 
3.8%
1 61
 
2.3%
3 58
 
2.2%
, 47
 
1.8%
8 27
 
1.0%
0 15
 
0.6%
Other values (5) 36
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11416
96.3%
None 436
 
3.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1698
14.9%
s 977
 
8.6%
a 938
 
8.2%
e 843
 
7.4%
i 769
 
6.7%
o 658
 
5.8%
r 522
 
4.6%
l 410
 
3.6%
n 391
 
3.4%
t 373
 
3.3%
Other values (46) 3837
33.6%
None
ValueCountFrequency (%)
á 136
31.2%
ç 67
15.4%
ã 65
14.9%
â 61
14.0%
í 48
 
11.0%
õ 46
 
10.6%
ô 12
 
2.8%
ó 1
 
0.2%
Distinct4
Distinct (%)80.0%
Missing4267
Missing (%)99.9%
Memory size33.5 KiB
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações
C22 - Fígado e Das Vias Biliares Intra-hepáticas
C48 - Tecidos Moles do Retroperitônio e do Peritônio
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos

Length

Max length64
Median length59
Mean length57.4
Min length48

Characters and Unicode

Total characters287
Distinct characters48
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)60.0%

Sample

1st rowC22 - Fígado e Das Vias Biliares Intra-hepáticas
2nd rowC48 - Tecidos Moles do Retroperitônio e do Peritônio
3rd rowC41 - Ossos e Das Cartilagens Articulares de Outras Localizações
4th rowC77 - Secundária e Não Especificada Dos Gânglios Linfáticos
5th rowC41 - Ossos e Das Cartilagens Articulares de Outras Localizações

Common Values

ValueCountFrequency (%)
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações 2
 
< 0.1%
C22 - Fígado e Das Vias Biliares Intra-hepáticas 1
 
< 0.1%
C48 - Tecidos Moles do Retroperitônio e do Peritônio 1
 
< 0.1%
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos 1
 
< 0.1%
(Missing) 4267
99.9%

Length

2023-02-28T14:20:38.187275image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:38.392374image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
5
 
10.9%
e 5
 
10.9%
das 3
 
6.5%
c41 2
 
4.3%
de 2
 
4.3%
outras 2
 
4.3%
localizações 2
 
4.3%
articulares 2
 
4.3%
cartilagens 2
 
4.3%
ossos 2
 
4.3%
Other values (18) 19
41.3%

Most occurring characters

ValueCountFrequency (%)
41
14.3%
s 26
 
9.1%
a 23
 
8.0%
e 22
 
7.7%
i 21
 
7.3%
o 16
 
5.6%
r 14
 
4.9%
t 12
 
4.2%
c 10
 
3.5%
l 9
 
3.1%
Other values (38) 93
32.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 198
69.0%
Space Separator 41
 
14.3%
Uppercase Letter 32
 
11.1%
Decimal Number 10
 
3.5%
Dash Punctuation 6
 
2.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 26
13.1%
a 23
11.6%
e 22
11.1%
i 21
10.6%
o 16
8.1%
r 14
 
7.1%
t 12
 
6.1%
c 10
 
5.1%
l 9
 
4.5%
d 8
 
4.0%
Other values (14) 37
18.7%
Uppercase Letter
ValueCountFrequency (%)
C 7
21.9%
D 4
12.5%
O 4
12.5%
L 3
9.4%
A 2
 
6.2%
F 1
 
3.1%
G 1
 
3.1%
E 1
 
3.1%
N 1
 
3.1%
S 1
 
3.1%
Other values (7) 7
21.9%
Decimal Number
ValueCountFrequency (%)
4 3
30.0%
7 2
20.0%
1 2
20.0%
2 2
20.0%
8 1
 
10.0%
Space Separator
ValueCountFrequency (%)
41
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 230
80.1%
Common 57
 
19.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 26
 
11.3%
a 23
 
10.0%
e 22
 
9.6%
i 21
 
9.1%
o 16
 
7.0%
r 14
 
6.1%
t 12
 
5.2%
c 10
 
4.3%
l 9
 
3.9%
d 8
 
3.5%
Other values (31) 69
30.0%
Common
ValueCountFrequency (%)
41
71.9%
- 6
 
10.5%
4 3
 
5.3%
7 2
 
3.5%
1 2
 
3.5%
2 2
 
3.5%
8 1
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 275
95.8%
None 12
 
4.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
41
14.9%
s 26
 
9.5%
a 23
 
8.4%
e 22
 
8.0%
i 21
 
7.6%
o 16
 
5.8%
r 14
 
5.1%
t 12
 
4.4%
c 10
 
3.6%
l 9
 
3.3%
Other values (31) 81
29.5%
None
ValueCountFrequency (%)
á 3
25.0%
ô 2
16.7%
õ 2
16.7%
ç 2
16.7%
ã 1
 
8.3%
í 1
 
8.3%
â 1
 
8.3%
Distinct16
Distinct (%)14.4%
Missing4161
Missing (%)97.4%
Memory size33.5 KiB
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações
23 
C22 - Fígado e Das Vias Biliares Intra-hepáticas
15 
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos
14 
C71 - Encefalo
11 
C34 - Bronquios e Pulmoes
Other values (11)
39 

Length

Max length64
Median length49
Mean length43.099099
Min length10

Characters and Unicode

Total characters4784
Distinct characters63
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.9%

Sample

1st rowC50 - Mama
2nd rowC44 - Pele nao-melanoma
3rd rowC42 - Sistema hematopoiético e reticuloendotelial
4th rowC74 - Glândula Supra-renal (Glândula Adrenal)
5th rowC06 - Outras parte da Boca

Common Values

ValueCountFrequency (%)
C41 - Ossos e Das Cartilagens Articulares de Outras Localizações 23
 
0.5%
C22 - Fígado e Das Vias Biliares Intra-hepáticas 15
 
0.4%
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos 14
 
0.3%
C71 - Encefalo 11
 
0.3%
C34 - Bronquios e Pulmoes 9
 
0.2%
C38 - Coração, Mediastino e Pleura, 7
 
0.2%
C74 - Glândula Supra-renal (Glândula Adrenal) 5
 
0.1%
C50 - Mama 5
 
0.1%
C40 - Ossos e Cartilagens Articulares Dos Membros 5
 
0.1%
C49 - Tecido Conjuntivo e de Outros Tecidos Moles 4
 
0.1%
Other values (6) 13
 
0.3%
(Missing) 4161
97.4%

Length

2023-02-28T14:20:38.581463image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
111
 
14.0%
e 82
 
10.4%
das 38
 
4.8%
ossos 28
 
3.5%
cartilagens 28
 
3.5%
articulares 28
 
3.5%
de 27
 
3.4%
outras 24
 
3.0%
c41 23
 
2.9%
localizações 23
 
2.9%
Other values (57) 379
47.9%

Most occurring characters

ValueCountFrequency (%)
680
 
14.2%
a 393
 
8.2%
s 384
 
8.0%
e 353
 
7.4%
i 288
 
6.0%
o 253
 
5.3%
r 211
 
4.4%
l 186
 
3.9%
t 161
 
3.4%
n 160
 
3.3%
Other values (53) 1715
35.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3166
66.2%
Space Separator 680
 
14.2%
Uppercase Letter 557
 
11.6%
Decimal Number 222
 
4.6%
Dash Punctuation 133
 
2.8%
Other Punctuation 16
 
0.3%
Close Punctuation 5
 
0.1%
Open Punctuation 5
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 393
12.4%
s 384
12.1%
e 353
11.1%
i 288
9.1%
o 253
8.0%
r 211
 
6.7%
l 186
 
5.9%
t 161
 
5.1%
n 160
 
5.1%
c 151
 
4.8%
Other values (21) 626
19.8%
Uppercase Letter
ValueCountFrequency (%)
C 150
26.9%
D 57
 
10.2%
O 56
 
10.1%
L 37
 
6.6%
A 33
 
5.9%
M 27
 
4.8%
E 27
 
4.8%
B 25
 
4.5%
G 24
 
4.3%
P 23
 
4.1%
Other values (7) 98
17.6%
Decimal Number
ValueCountFrequency (%)
4 57
25.7%
7 47
21.2%
1 34
15.3%
2 32
14.4%
3 16
 
7.2%
0 14
 
6.3%
8 10
 
4.5%
5 5
 
2.3%
9 4
 
1.8%
6 3
 
1.4%
Space Separator
ValueCountFrequency (%)
680
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 133
100.0%
Other Punctuation
ValueCountFrequency (%)
, 16
100.0%
Close Punctuation
ValueCountFrequency (%)
) 5
100.0%
Open Punctuation
ValueCountFrequency (%)
( 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3723
77.8%
Common 1061
 
22.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 393
 
10.6%
s 384
 
10.3%
e 353
 
9.5%
i 288
 
7.7%
o 253
 
6.8%
r 211
 
5.7%
l 186
 
5.0%
t 161
 
4.3%
n 160
 
4.3%
c 151
 
4.1%
Other values (38) 1183
31.8%
Common
ValueCountFrequency (%)
680
64.1%
- 133
 
12.5%
4 57
 
5.4%
7 47
 
4.4%
1 34
 
3.2%
2 32
 
3.0%
3 16
 
1.5%
, 16
 
1.5%
0 14
 
1.3%
8 10
 
0.9%
Other values (5) 22
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4620
96.6%
None 164
 
3.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
680
14.7%
a 393
 
8.5%
s 384
 
8.3%
e 353
 
7.6%
i 288
 
6.2%
o 253
 
5.5%
r 211
 
4.6%
l 186
 
4.0%
t 161
 
3.5%
n 160
 
3.5%
Other values (45) 1551
33.6%
None
ValueCountFrequency (%)
á 43
26.2%
ç 30
18.3%
â 24
14.6%
õ 23
14.0%
ã 21
12.8%
í 15
 
9.1%
ô 6
 
3.7%
é 2
 
1.2%
Distinct1
Distinct (%)100.0%
Missing4271
Missing (%)> 99.9%
Memory size33.5 KiB
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos

Length

Max length59
Median length59
Mean length59
Min length59

Characters and Unicode

Total characters59
Distinct characters28
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)100.0%

Sample

1st rowC77 - Secundária e Não Especificada Dos Gânglios Linfáticos

Common Values

ValueCountFrequency (%)
C77 - Secundária e Não Especificada Dos Gânglios Linfáticos 1
 
< 0.1%
(Missing) 4271
> 99.9%

Length

2023-02-28T14:20:39.353379image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:39.518875image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
c77 1
11.1%
1
11.1%
secundária 1
11.1%
e 1
11.1%
não 1
11.1%
especificada 1
11.1%
dos 1
11.1%
gânglios 1
11.1%
linfáticos 1
11.1%

Most occurring characters

ValueCountFrequency (%)
8
 
13.6%
i 6
 
10.2%
s 4
 
6.8%
o 4
 
6.8%
c 4
 
6.8%
e 3
 
5.1%
a 3
 
5.1%
n 3
 
5.1%
á 2
 
3.4%
7 2
 
3.4%
Other values (18) 20
33.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 41
69.5%
Space Separator 8
 
13.6%
Uppercase Letter 7
 
11.9%
Decimal Number 2
 
3.4%
Dash Punctuation 1
 
1.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 6
14.6%
s 4
9.8%
o 4
9.8%
c 4
9.8%
e 3
 
7.3%
a 3
 
7.3%
n 3
 
7.3%
á 2
 
4.9%
d 2
 
4.9%
f 2
 
4.9%
Other values (8) 8
19.5%
Uppercase Letter
ValueCountFrequency (%)
G 1
14.3%
L 1
14.3%
D 1
14.3%
C 1
14.3%
N 1
14.3%
E 1
14.3%
S 1
14.3%
Space Separator
ValueCountFrequency (%)
8
100.0%
Decimal Number
ValueCountFrequency (%)
7 2
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 48
81.4%
Common 11
 
18.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 6
 
12.5%
s 4
 
8.3%
o 4
 
8.3%
c 4
 
8.3%
e 3
 
6.2%
a 3
 
6.2%
n 3
 
6.2%
á 2
 
4.2%
d 2
 
4.2%
f 2
 
4.2%
Other values (15) 15
31.2%
Common
ValueCountFrequency (%)
8
72.7%
7 2
 
18.2%
- 1
 
9.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 55
93.2%
None 4
 
6.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
8
14.5%
i 6
 
10.9%
s 4
 
7.3%
o 4
 
7.3%
c 4
 
7.3%
e 3
 
5.5%
a 3
 
5.5%
n 3
 
5.5%
7 2
 
3.6%
d 2
 
3.6%
Other values (15) 16
29.1%
None
ValueCountFrequency (%)
á 2
50.0%
â 1
25.0%
ã 1
25.0%
Distinct43
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size33.5 KiB
CARCINOMA DUCTAL INFILTRANTE SOE
3793 
CARCINOMA LOBULAR SOE
 
140
ADENOCARCINOMA MUCINOSO
 
49
CARCINOMA METAPLASICO SOE
 
46
CARCINOMA PAPILAR SOE
 
38
Other values (38)
 
206

Length

Max length65
Median length32
Mean length31.561564
Min length11

Characters and Unicode

Total characters134831
Distinct characters21
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13 ?
Unique (%)0.3%

Sample

1st rowCARCINOMA DUCTAL INFILTRANTE SOE
2nd rowCARCINOMA DUCTAL INFILTRANTE SOE
3rd rowADENOCARCINOMA MUCINOSO
4th rowCARCINOMA DUCTAL INFILTRANTE SOE
5th rowCARCINOMA DUCTAL INFILTRANTE SOE

Common Values

ValueCountFrequency (%)
CARCINOMA DUCTAL INFILTRANTE SOE 3793
88.8%
CARCINOMA LOBULAR SOE 140
 
3.3%
ADENOCARCINOMA MUCINOSO 49
 
1.1%
CARCINOMA METAPLASICO SOE 46
 
1.1%
CARCINOMA PAPILAR SOE 38
 
0.9%
CARCINOMA INTRADUCTAL NAO INFILTRANTE SOE 30
 
0.7%
ADENOCARCINOMA PAPILAR INTRADUCTAL COM INVASAO 19
 
0.4%
CARCINOMA DE CELULAS ACINOSAS 19
 
0.4%
CARCINOMA DUCTAL INFILTRATIVO MISTO COM OUTROS TIPOS DE CARCINOMA 18
 
0.4%
CARCINOMA DUCTAL INFILTRANTE E LOBULAR 14
 
0.3%
Other values (33) 106
 
2.5%

Length

2023-02-28T14:20:39.687317image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
carcinoma 4194
24.8%
soe 4092
24.2%
infiltrante 3857
22.8%
ductal 3833
22.7%
lobular 164
 
1.0%
adenocarcinoma 90
 
0.5%
papilar 65
 
0.4%
de 63
 
0.4%
intraductal 51
 
0.3%
mucinoso 49
 
0.3%
Other values (48) 449
 
2.7%

Most occurring characters

ValueCountFrequency (%)
A 17125
12.7%
C 12706
9.4%
12635
9.4%
I 12438
9.2%
N 12330
9.1%
T 11859
8.8%
O 9113
6.8%
R 8526
 
6.3%
L 8286
 
6.1%
E 8282
 
6.1%
Other values (11) 21531
16.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 122196
90.6%
Space Separator 12635
 
9.4%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 17125
14.0%
C 12706
10.4%
I 12438
10.2%
N 12330
10.1%
T 11859
9.7%
O 9113
7.5%
R 8526
7.0%
L 8286
6.8%
E 8282
6.8%
M 4516
 
3.7%
Other values (10) 17015
13.9%
Space Separator
ValueCountFrequency (%)
12635
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 122196
90.6%
Common 12635
 
9.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 17125
14.0%
C 12706
10.4%
I 12438
10.2%
N 12330
10.1%
T 11859
9.7%
O 9113
7.5%
R 8526
7.0%
L 8286
6.8%
E 8282
6.8%
M 4516
 
3.7%
Other values (10) 17015
13.9%
Common
ValueCountFrequency (%)
12635
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 134831
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 17125
12.7%
C 12706
9.4%
12635
9.4%
I 12438
9.2%
N 12330
9.1%
T 11859
8.8%
O 9113
6.8%
R 8526
 
6.3%
L 8286
 
6.1%
E 8282
 
6.1%
Other values (11) 21531
16.0%

descricao_da_morfologia_de_acordo_com_cid_o_2
Categorical

HIGH CARDINALITY  MISSING 

Distinct73
Distinct (%)19.8%
Missing3903
Missing (%)91.4%
Memory size33.5 KiB
CARCINOMA DUCTAL INFILTRANTE SOE
111 
CARCINOMA INTRADUCTAL NAO INFILTRANTE SOE
43 
CARCINOMA ESCAMOCELULAR SOE
25 
ADENOCARCINOMA SOE
23 
CARCINOMA BASOCELULAR NODULAR
 
13
Other values (68)
154 

Length

Max length67
Median length58
Mean length30.802168
Min length12

Characters and Unicode

Total characters11366
Distinct characters38
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique40 ?
Unique (%)10.8%

Sample

1st rowADENOCARCINOMA SOE
2nd rowADENOCARCINOMA SOE
3rd rowCARCINOMA DUCTAL INFILTRANTE SOE
4th rowMELANOMA DE PROPAGACAO SUPERFICIAL
5th rowMELANOMA MALIGNO SOE

Common Values

ValueCountFrequency (%)
CARCINOMA DUCTAL INFILTRANTE SOE 111
 
2.6%
CARCINOMA INTRADUCTAL NAO INFILTRANTE SOE 43
 
1.0%
CARCINOMA ESCAMOCELULAR SOE 25
 
0.6%
ADENOCARCINOMA SOE 23
 
0.5%
CARCINOMA BASOCELULAR NODULAR 13
 
0.3%
CARCINOMA LOBULAR SOE 11
 
0.3%
CARCINOMA DE CELULAS RENAIS SOE 11
 
0.3%
ADENOCARCINOMA TUBULAR 10
 
0.2%
ADENOCARCINOMA PAPILAR SOE 9
 
0.2%
CARCINOMA DE CELULAS ACINOSAS 8
 
0.2%
Other values (63) 105
 
2.5%
(Missing) 3903
91.4%

Length

2023-02-28T14:20:39.920561image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
soe 290
20.7%
carcinoma 265
18.9%
infiltrante 156
11.1%
ductal 113
 
8.1%
adenocarcinoma 65
 
4.6%
de 48
 
3.4%
intraductal 47
 
3.4%
nao 47
 
3.4%
celulas 34
 
2.4%
escamocelular 30
 
2.1%
Other values (115) 306
21.8%

Most occurring characters

ValueCountFrequency (%)
A 1564
13.8%
C 1034
9.1%
1032
9.1%
O 1015
8.9%
N 958
8.4%
I 914
8.0%
E 820
7.2%
R 748
 
6.6%
L 634
 
5.6%
T 610
 
5.4%
Other values (28) 2037
17.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 10310
90.7%
Space Separator 1032
 
9.1%
Decimal Number 10
 
0.1%
Dash Punctuation 4
 
< 0.1%
Other Punctuation 4
 
< 0.1%
Open Punctuation 3
 
< 0.1%
Close Punctuation 3
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 1564
15.2%
C 1034
10.0%
O 1015
9.8%
N 958
9.3%
I 914
8.9%
E 820
8.0%
R 748
7.3%
L 634
6.1%
T 610
 
5.9%
S 499
 
4.8%
Other values (14) 1514
14.7%
Decimal Number
ValueCountFrequency (%)
2 2
20.0%
0 2
20.0%
8 1
10.0%
4 1
10.0%
1 1
10.0%
9 1
10.0%
5 1
10.0%
3 1
10.0%
Other Punctuation
ValueCountFrequency (%)
/ 2
50.0%
\ 2
50.0%
Space Separator
ValueCountFrequency (%)
1032
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 10310
90.7%
Common 1056
 
9.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 1564
15.2%
C 1034
10.0%
O 1015
9.8%
N 958
9.3%
I 914
8.9%
E 820
8.0%
R 748
7.3%
L 634
6.1%
T 610
 
5.9%
S 499
 
4.8%
Other values (14) 1514
14.7%
Common
ValueCountFrequency (%)
1032
97.7%
- 4
 
0.4%
( 3
 
0.3%
) 3
 
0.3%
2 2
 
0.2%
0 2
 
0.2%
/ 2
 
0.2%
\ 2
 
0.2%
8 1
 
0.1%
4 1
 
0.1%
Other values (4) 4
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11366
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 1564
13.8%
C 1034
9.1%
1032
9.1%
O 1015
8.9%
N 958
8.4%
I 914
8.0%
E 820
7.2%
R 748
 
6.6%
L 634
 
5.6%
T 610
 
5.4%
Other values (28) 2037
17.9%
Distinct13
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size33.5 KiB
MAMA SOE (EXCLUI PELE DA MAMA C44.5)
1927 
MAMA QUADRANTE SUPERIOR EXTERNO DA
1155 
MAMA QUADRANTE SUPERIOR INTERNO DA
282 
MAMA QUADRANTE INFERIOR EXTERNO DA
247 
MAMA LESAO SOBREPOSTA DA
 
175
Other values (8)
486 

Length

Max length36
Median length34
Mean length33.171348
Min length4

Characters and Unicode

Total characters141708
Distinct characters25
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)0.1%

Sample

1st rowMAMA QUADRANTE SUPERIOR EXTERNO DA
2nd rowMAMA LESAO SOBREPOSTA DA
3rd rowMAMA SOE (EXCLUI PELE DA MAMA C44.5)
4th rowMAMA QUADRANTE INFERIOR EXTERNO DA
5th rowMAMA LESAO SOBREPOSTA DA

Common Values

ValueCountFrequency (%)
MAMA SOE (EXCLUI PELE DA MAMA C44.5) 1927
45.1%
MAMA QUADRANTE SUPERIOR EXTERNO DA 1155
27.0%
MAMA QUADRANTE SUPERIOR INTERNO DA 282
 
6.6%
MAMA QUADRANTE INFERIOR EXTERNO DA 247
 
5.8%
MAMA LESAO SOBREPOSTA DA 175
 
4.1%
MAMA QUADRANTE INFERIOR INTERNO DA 174
 
4.1%
MAMA MAMILO 168
 
3.9%
MAMA PORCAO CENTRAL DA 131
 
3.1%
MAMA PORCAO AXILAR DA 9
 
0.2%
ASSOALHO DA BOCA SOE 1
 
< 0.1%
Other values (3) 3
 
0.1%

Length

2023-02-28T14:20:40.133076image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mama 6195
25.4%
da 4101
16.8%
soe 1929
 
7.9%
exclui 1927
 
7.9%
pele 1927
 
7.9%
c44.5 1927
 
7.9%
quadrante 1858
 
7.6%
superior 1437
 
5.9%
externo 1402
 
5.7%
interno 456
 
1.9%
Other values (14) 1226
 
5.0%

Most occurring characters

ValueCountFrequency (%)
A 21017
14.8%
20113
14.2%
E 15170
10.7%
M 12726
 
9.0%
R 7889
 
5.6%
O 6627
 
4.7%
D 5960
 
4.2%
U 5223
 
3.7%
I 4839
 
3.4%
N 4724
 
3.3%
Other values (15) 37420
26.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 110033
77.6%
Space Separator 20113
 
14.2%
Decimal Number 5781
 
4.1%
Close Punctuation 1927
 
1.4%
Other Punctuation 1927
 
1.4%
Open Punctuation 1927
 
1.4%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 21017
19.1%
E 15170
13.8%
M 12726
11.6%
R 7889
 
7.2%
O 6627
 
6.0%
D 5960
 
5.4%
U 5223
 
4.7%
I 4839
 
4.4%
N 4724
 
4.3%
L 4339
 
3.9%
Other values (9) 21519
19.6%
Decimal Number
ValueCountFrequency (%)
4 3854
66.7%
5 1927
33.3%
Space Separator
ValueCountFrequency (%)
20113
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1927
100.0%
Other Punctuation
ValueCountFrequency (%)
. 1927
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1927
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 110033
77.6%
Common 31675
 
22.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 21017
19.1%
E 15170
13.8%
M 12726
11.6%
R 7889
 
7.2%
O 6627
 
6.0%
D 5960
 
5.4%
U 5223
 
4.7%
I 4839
 
4.4%
N 4724
 
4.3%
L 4339
 
3.9%
Other values (9) 21519
19.6%
Common
ValueCountFrequency (%)
20113
63.5%
4 3854
 
12.2%
5 1927
 
6.1%
) 1927
 
6.1%
. 1927
 
6.1%
( 1927
 
6.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 141708
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 21017
14.8%
20113
14.2%
E 15170
10.7%
M 12726
 
9.0%
R 7889
 
5.6%
O 6627
 
4.7%
D 5960
 
4.2%
U 5223
 
3.7%
I 4839
 
3.4%
N 4724
 
3.3%
Other values (15) 37420
26.4%

descricao_da_topografia_2
Categorical

HIGH CARDINALITY  MISSING 

Distinct72
Distinct (%)19.5%
Missing3903
Missing (%)91.4%
Memory size33.5 KiB
MAMA SOE (EXCLUI PELE DA MAMA C44.5)
87 
MAMA QUADRANTE SUPERIOR EXTERNO DA
42 
GLANDULA TIREOIDE
 
17
MAMA QUADRANTE SUPERIOR INTERNO DA
 
15
PELE DO OMBRO E MEMBROS SUPERIORES
 
12
Other values (67)
196 

Length

Max length77
Median length59
Mean length26.96748
Min length4

Characters and Unicode

Total characters9951
Distinct characters30
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique34 ?
Unique (%)9.2%

Sample

1st rowCOLO DO UTERO
2nd rowCOLON DESCENDENTE
3rd rowMAMA SOE (EXCLUI PELE DA MAMA C44.5)
4th rowPELE DO TRONCO
5th rowPELE DO QUADRIL E MEMBROS INFERIORES

Common Values

ValueCountFrequency (%)
MAMA SOE (EXCLUI PELE DA MAMA C44.5) 87
 
2.0%
MAMA QUADRANTE SUPERIOR EXTERNO DA 42
 
1.0%
GLANDULA TIREOIDE 17
 
0.4%
MAMA QUADRANTE SUPERIOR INTERNO DA 15
 
0.4%
PELE DO OMBRO E MEMBROS SUPERIORES 12
 
0.3%
RIM SOE 12
 
0.3%
MAMA QUADRANTE INFERIOR INTERNO DA 12
 
0.3%
MAMA LESAO SOBREPOSTA DA 11
 
0.3%
MAMA QUADRANTE INFERIOR EXTERNO DA 10
 
0.2%
PELE DE OUTRAS PARTES E DE PARTES NAO ESPECIFICADAS DA FACE 9
 
0.2%
Other values (62) 142
 
3.3%
(Missing) 3903
91.4%

Length

2023-02-28T14:20:40.346674image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mama 274
16.2%
da 197
 
11.7%
pele 125
 
7.4%
soe 120
 
7.1%
exclui 87
 
5.2%
c44.5 87
 
5.2%
quadrante 79
 
4.7%
superior 65
 
3.9%
do 60
 
3.6%
externo 54
 
3.2%
Other values (122) 539
32.0%

Most occurring characters

ValueCountFrequency (%)
1319
13.3%
A 1218
12.2%
E 1125
11.3%
O 760
 
7.6%
M 673
 
6.8%
R 618
 
6.2%
D 458
 
4.6%
I 437
 
4.4%
S 397
 
4.0%
L 358
 
3.6%
Other values (20) 2588
26.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 8105
81.4%
Space Separator 1319
 
13.3%
Decimal Number 261
 
2.6%
Other Punctuation 89
 
0.9%
Close Punctuation 88
 
0.9%
Open Punctuation 88
 
0.9%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 1218
15.0%
E 1125
13.9%
O 760
9.4%
M 673
 
8.3%
R 618
 
7.6%
D 458
 
5.7%
I 437
 
5.4%
S 397
 
4.9%
L 358
 
4.4%
U 338
 
4.2%
Other values (13) 1723
21.3%
Decimal Number
ValueCountFrequency (%)
4 174
66.7%
5 87
33.3%
Space Separator
ValueCountFrequency (%)
1319
100.0%
Other Punctuation
ValueCountFrequency (%)
. 89
100.0%
Close Punctuation
ValueCountFrequency (%)
) 88
100.0%
Open Punctuation
ValueCountFrequency (%)
( 88
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 8105
81.4%
Common 1846
 
18.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 1218
15.0%
E 1125
13.9%
O 760
9.4%
M 673
 
8.3%
R 618
 
7.6%
D 458
 
5.7%
I 437
 
5.4%
S 397
 
4.9%
L 358
 
4.4%
U 338
 
4.2%
Other values (13) 1723
21.3%
Common
ValueCountFrequency (%)
1319
71.5%
4 174
 
9.4%
. 89
 
4.8%
) 88
 
4.8%
( 88
 
4.8%
5 87
 
4.7%
- 1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9951
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1319
13.3%
A 1218
12.2%
E 1125
11.3%
O 760
 
7.6%
M 673
 
6.8%
R 618
 
6.2%
D 458
 
4.6%
I 437
 
4.4%
S 397
 
4.0%
L 358
 
3.6%
Other values (20) 2588
26.0%
Distinct10
Distinct (%)5.4%
Missing4086
Missing (%)95.6%
Memory size33.5 KiB
0
101 
1
44 
2
14 
2A
 
9
3
 
8
Other values (5)
 
10

Length

Max length31
Median length1
Mean length1.4301075
Min length1

Characters and Unicode

Total characters266
Distinct characters27
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)1.1%

Sample

1st rowX - nao foi possivel determinar
2nd row1
3rd row1
4th row0
5th row3

Common Values

ValueCountFrequency (%)
0 101
 
2.4%
1 44
 
1.0%
2 14
 
0.3%
2A 9
 
0.2%
3 8
 
0.2%
3A 4
 
0.1%
X - nao foi possivel determinar 2
 
< 0.1%
3B 2
 
< 0.1%
3C 1
 
< 0.1%
Y: Na 1
 
< 0.1%
(Missing) 4086
95.6%

Length

2023-02-28T14:20:40.568391image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:40.784199image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
0 101
51.3%
1 44
22.3%
2 14
 
7.1%
2a 9
 
4.6%
3 8
 
4.1%
3a 4
 
2.0%
x 2
 
1.0%
2
 
1.0%
nao 2
 
1.0%
foi 2
 
1.0%
Other values (6) 9
 
4.6%

Most occurring characters

ValueCountFrequency (%)
0 101
38.0%
1 44
16.5%
2 23
 
8.6%
3 15
 
5.6%
A 13
 
4.9%
11
 
4.1%
o 6
 
2.3%
i 6
 
2.3%
e 6
 
2.3%
a 5
 
1.9%
Other values (17) 36
 
13.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 183
68.8%
Lowercase Letter 49
 
18.4%
Uppercase Letter 20
 
7.5%
Space Separator 11
 
4.1%
Dash Punctuation 2
 
0.8%
Other Punctuation 1
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 6
12.2%
i 6
12.2%
e 6
12.2%
a 5
10.2%
r 4
8.2%
s 4
8.2%
n 4
8.2%
d 2
 
4.1%
m 2
 
4.1%
t 2
 
4.1%
Other values (4) 8
16.3%
Uppercase Letter
ValueCountFrequency (%)
A 13
65.0%
B 2
 
10.0%
X 2
 
10.0%
C 1
 
5.0%
Y 1
 
5.0%
N 1
 
5.0%
Decimal Number
ValueCountFrequency (%)
0 101
55.2%
1 44
24.0%
2 23
 
12.6%
3 15
 
8.2%
Space Separator
ValueCountFrequency (%)
11
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%
Other Punctuation
ValueCountFrequency (%)
: 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 197
74.1%
Latin 69
 
25.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 13
18.8%
o 6
 
8.7%
i 6
 
8.7%
e 6
 
8.7%
a 5
 
7.2%
r 4
 
5.8%
s 4
 
5.8%
n 4
 
5.8%
d 2
 
2.9%
m 2
 
2.9%
Other values (10) 17
24.6%
Common
ValueCountFrequency (%)
0 101
51.3%
1 44
22.3%
2 23
 
11.7%
3 15
 
7.6%
11
 
5.6%
- 2
 
1.0%
: 1
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 266
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 101
38.0%
1 44
16.5%
2 23
 
8.6%
3 15
 
5.6%
A 13
 
4.9%
11
 
4.1%
o 6
 
2.3%
i 6
 
2.3%
e 6
 
2.3%
a 5
 
1.9%
Other values (17) 36
 
13.5%
Distinct2
Distinct (%)40.0%
Missing4267
Missing (%)99.9%
Memory size33.5 KiB
0
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters5
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)20.0%

Sample

1st row0
2nd row0
3rd row1
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 4
 
0.1%
1 1
 
< 0.1%
(Missing) 4267
99.9%

Length

2023-02-28T14:20:40.951977image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:41.120038image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
0 4
80.0%
1 1
 
20.0%

Most occurring characters

ValueCountFrequency (%)
0 4
80.0%
1 1
 
20.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 5
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 4
80.0%
1 1
 
20.0%

Most occurring scripts

ValueCountFrequency (%)
Common 5
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 4
80.0%
1 1
 
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 4
80.0%
1 1
 
20.0%
Distinct14
Distinct (%)7.5%
Missing4085
Missing (%)95.6%
Memory size33.5 KiB
2
81 
1C
33 
3
21 
1
16 
1B
11 
Other values (9)
25 

Length

Max length5
Median length1
Mean length1.3957219
Min length1

Characters and Unicode

Total characters261
Distinct characters17
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)2.7%

Sample

1st row2
2nd row1A
3rd row1C
4th row2
5th row3

Common Values

ValueCountFrequency (%)
2 81
 
1.9%
1C 33
 
0.8%
3 21
 
0.5%
1 16
 
0.4%
1B 11
 
0.3%
1A 7
 
0.2%
4B 7
 
0.2%
IS 3
 
0.1%
IV 3
 
0.1%
4D 1
 
< 0.1%
Other values (4) 4
 
0.1%
(Missing) 4085
95.6%

Length

2023-02-28T14:20:41.272151image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2 81
43.1%
1c 33
17.6%
3 21
 
11.2%
1 16
 
8.5%
1b 11
 
5.9%
1a 7
 
3.7%
4b 7
 
3.7%
is 3
 
1.6%
iv 3
 
1.6%
4d 1
 
0.5%
Other values (5) 5
 
2.7%

Most occurring characters

ValueCountFrequency (%)
2 82
31.4%
1 68
26.1%
C 36
13.8%
3 21
 
8.0%
B 18
 
6.9%
4 9
 
3.4%
A 7
 
2.7%
I 7
 
2.7%
V 3
 
1.1%
S 3
 
1.1%
Other values (7) 7
 
2.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 180
69.0%
Uppercase Letter 78
29.9%
Other Punctuation 1
 
0.4%
Space Separator 1
 
0.4%
Lowercase Letter 1
 
0.4%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C 36
46.2%
B 18
23.1%
A 7
 
9.0%
I 7
 
9.0%
V 3
 
3.8%
S 3
 
3.8%
D 1
 
1.3%
M 1
 
1.3%
Y 1
 
1.3%
N 1
 
1.3%
Decimal Number
ValueCountFrequency (%)
2 82
45.6%
1 68
37.8%
3 21
 
11.7%
4 9
 
5.0%
Other Punctuation
ValueCountFrequency (%)
: 1
100.0%
Space Separator
ValueCountFrequency (%)
1
100.0%
Lowercase Letter
ValueCountFrequency (%)
a 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 182
69.7%
Latin 79
30.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 36
45.6%
B 18
22.8%
A 7
 
8.9%
I 7
 
8.9%
V 3
 
3.8%
S 3
 
3.8%
D 1
 
1.3%
M 1
 
1.3%
Y 1
 
1.3%
N 1
 
1.3%
Common
ValueCountFrequency (%)
2 82
45.1%
1 68
37.4%
3 21
 
11.5%
4 9
 
4.9%
: 1
 
0.5%
1
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 261
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 82
31.4%
1 68
26.1%
C 36
13.8%
3 21
 
8.0%
B 18
 
6.9%
4 9
 
3.4%
A 7
 
2.7%
I 7
 
2.7%
V 3
 
1.1%
S 3
 
1.1%
Other values (7) 7
 
2.7%
Distinct3
Distinct (%)60.0%
Missing4267
Missing (%)99.9%
Memory size33.5 KiB
1B
1
IV

Length

Max length2
Median length2
Mean length1.6
Min length1

Characters and Unicode

Total characters8
Distinct characters4
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)20.0%

Sample

1st row1B
2nd row1
3rd row1B
4th rowIV
5th row1

Common Values

ValueCountFrequency (%)
1B 2
 
< 0.1%
1 2
 
< 0.1%
IV 1
 
< 0.1%
(Missing) 4267
99.9%

Length

2023-02-28T14:20:41.467787image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:41.646864image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
1b 2
40.0%
1 2
40.0%
iv 1
20.0%

Most occurring characters

ValueCountFrequency (%)
1 4
50.0%
B 2
25.0%
I 1
 
12.5%
V 1
 
12.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 4
50.0%
Uppercase Letter 4
50.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
B 2
50.0%
I 1
25.0%
V 1
25.0%
Decimal Number
ValueCountFrequency (%)
1 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 4
50.0%
Latin 4
50.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
B 2
50.0%
I 1
25.0%
V 1
25.0%
Common
ValueCountFrequency (%)
1 4
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 4
50.0%
B 2
25.0%
I 1
 
12.5%
V 1
 
12.5%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.5 KiB
Não
3503 
Sim
769 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12816
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNão
2nd rowNão
3rd rowNão
4th rowNão
5th rowNão

Common Values

ValueCountFrequency (%)
Não 3503
82.0%
Sim 769
 
18.0%

Length

2023-02-28T14:20:41.789003image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:41.970971image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
não 3503
82.0%
sim 769
 
18.0%

Most occurring characters

ValueCountFrequency (%)
N 3503
27.3%
ã 3503
27.3%
o 3503
27.3%
S 769
 
6.0%
i 769
 
6.0%
m 769
 
6.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 8544
66.7%
Uppercase Letter 4272
33.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
ã 3503
41.0%
o 3503
41.0%
i 769
 
9.0%
m 769
 
9.0%
Uppercase Letter
ValueCountFrequency (%)
N 3503
82.0%
S 769
 
18.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 12816
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 3503
27.3%
ã 3503
27.3%
o 3503
27.3%
S 769
 
6.0%
i 769
 
6.0%
m 769
 
6.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9313
72.7%
None 3503
 
27.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 3503
37.6%
o 3503
37.6%
S 769
 
8.3%
i 769
 
8.3%
m 769
 
8.3%
None
ValueCountFrequency (%)
ã 3503
100.0%

com_recidiva_a_distancia_2
Categorical

IMBALANCE  MISSING 

Distinct2
Distinct (%)0.5%
Missing3903
Missing (%)91.4%
Memory size33.5 KiB
Não
329 
Sim
40 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1107
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNão
2nd rowSim
3rd rowNão
4th rowNão
5th rowNão

Common Values

ValueCountFrequency (%)
Não 329
 
7.7%
Sim 40
 
0.9%
(Missing) 3903
91.4%

Length

2023-02-28T14:20:42.122986image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:42.302555image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
não 329
89.2%
sim 40
 
10.8%

Most occurring characters

ValueCountFrequency (%)
N 329
29.7%
ã 329
29.7%
o 329
29.7%
S 40
 
3.6%
i 40
 
3.6%
m 40
 
3.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 738
66.7%
Uppercase Letter 369
33.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
ã 329
44.6%
o 329
44.6%
i 40
 
5.4%
m 40
 
5.4%
Uppercase Letter
ValueCountFrequency (%)
N 329
89.2%
S 40
 
10.8%

Most occurring scripts

ValueCountFrequency (%)
Latin 1107
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 329
29.7%
ã 329
29.7%
o 329
29.7%
S 40
 
3.6%
i 40
 
3.6%
m 40
 
3.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 778
70.3%
None 329
29.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 329
42.3%
o 329
42.3%
S 40
 
5.1%
i 40
 
5.1%
m 40
 
5.1%
None
ValueCountFrequency (%)
ã 329
100.0%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.5 KiB
Não
4002 
Sim
 
270

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12816
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNão
2nd rowSim
3rd rowNão
4th rowSim
5th rowNão

Common Values

ValueCountFrequency (%)
Não 4002
93.7%
Sim 270
 
6.3%

Length

2023-02-28T14:20:42.456152image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:42.650355image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
não 4002
93.7%
sim 270
 
6.3%

Most occurring characters

ValueCountFrequency (%)
N 4002
31.2%
ã 4002
31.2%
o 4002
31.2%
S 270
 
2.1%
i 270
 
2.1%
m 270
 
2.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 8544
66.7%
Uppercase Letter 4272
33.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
ã 4002
46.8%
o 4002
46.8%
i 270
 
3.2%
m 270
 
3.2%
Uppercase Letter
ValueCountFrequency (%)
N 4002
93.7%
S 270
 
6.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 12816
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 4002
31.2%
ã 4002
31.2%
o 4002
31.2%
S 270
 
2.1%
i 270
 
2.1%
m 270
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8814
68.8%
None 4002
31.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 4002
45.4%
o 4002
45.4%
S 270
 
3.1%
i 270
 
3.1%
m 270
 
3.1%
None
ValueCountFrequency (%)
ã 4002
100.0%

com_recidiva_regional_2
Categorical

IMBALANCE  MISSING 

Distinct2
Distinct (%)0.5%
Missing3903
Missing (%)91.4%
Memory size33.5 KiB
Não
358 
Sim
 
11

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1107
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNão
2nd rowNão
3rd rowNão
4th rowNão
5th rowSim

Common Values

ValueCountFrequency (%)
Não 358
 
8.4%
Sim 11
 
0.3%
(Missing) 3903
91.4%

Length

2023-02-28T14:20:42.792653image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:42.985512image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
não 358
97.0%
sim 11
 
3.0%

Most occurring characters

ValueCountFrequency (%)
N 358
32.3%
ã 358
32.3%
o 358
32.3%
S 11
 
1.0%
i 11
 
1.0%
m 11
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 738
66.7%
Uppercase Letter 369
33.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
ã 358
48.5%
o 358
48.5%
i 11
 
1.5%
m 11
 
1.5%
Uppercase Letter
ValueCountFrequency (%)
N 358
97.0%
S 11
 
3.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1107
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 358
32.3%
ã 358
32.3%
o 358
32.3%
S 11
 
1.0%
i 11
 
1.0%
m 11
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 749
67.7%
None 358
32.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 358
47.8%
o 358
47.8%
S 11
 
1.5%
i 11
 
1.5%
m 11
 
1.5%
None
ValueCountFrequency (%)
ã 358
100.0%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.5 KiB
Não
3941 
Sim
 
331

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12816
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNão
2nd rowSim
3rd rowNão
4th rowNão
5th rowNão

Common Values

ValueCountFrequency (%)
Não 3941
92.3%
Sim 331
 
7.7%

Length

2023-02-28T14:20:43.120386image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:43.301135image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
não 3941
92.3%
sim 331
 
7.7%

Most occurring characters

ValueCountFrequency (%)
N 3941
30.8%
ã 3941
30.8%
o 3941
30.8%
S 331
 
2.6%
i 331
 
2.6%
m 331
 
2.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 8544
66.7%
Uppercase Letter 4272
33.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
ã 3941
46.1%
o 3941
46.1%
i 331
 
3.9%
m 331
 
3.9%
Uppercase Letter
ValueCountFrequency (%)
N 3941
92.3%
S 331
 
7.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 12816
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 3941
30.8%
ã 3941
30.8%
o 3941
30.8%
S 331
 
2.6%
i 331
 
2.6%
m 331
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8875
69.2%
None 3941
30.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 3941
44.4%
o 3941
44.4%
S 331
 
3.7%
i 331
 
3.7%
m 331
 
3.7%
None
ValueCountFrequency (%)
ã 3941
100.0%

com_recidiva_local_2
Categorical

IMBALANCE  MISSING 

Distinct2
Distinct (%)0.5%
Missing3903
Missing (%)91.4%
Memory size33.5 KiB
Não
344 
Sim
 
25

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1107
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNão
2nd rowNão
3rd rowNão
4th rowNão
5th rowNão

Common Values

ValueCountFrequency (%)
Não 344
 
8.1%
Sim 25
 
0.6%
(Missing) 3903
91.4%

Length

2023-02-28T14:20:43.445682image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:20:43.650995image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
não 344
93.2%
sim 25
 
6.8%

Most occurring characters

ValueCountFrequency (%)
N 344
31.1%
ã 344
31.1%
o 344
31.1%
S 25
 
2.3%
i 25
 
2.3%
m 25
 
2.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 738
66.7%
Uppercase Letter 369
33.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
ã 344
46.6%
o 344
46.6%
i 25
 
3.4%
m 25
 
3.4%
Uppercase Letter
ValueCountFrequency (%)
N 344
93.2%
S 25
 
6.8%

Most occurring scripts

ValueCountFrequency (%)
Latin 1107
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 344
31.1%
ã 344
31.1%
o 344
31.1%
S 25
 
2.3%
i 25
 
2.3%
m 25
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 763
68.9%
None 344
31.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 344
45.1%
o 344
45.1%
S 25
 
3.3%
i 25
 
3.3%
m 25
 
3.3%
None
ValueCountFrequency (%)
ã 344
100.0%

Interactions

2023-02-28T14:20:12.170726image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:02.778830image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:05.306119image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:06.708470image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:08.142903image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:09.509906image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:10.809050image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:12.343255image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:03.100215image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:05.507766image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:06.906049image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:08.354353image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:09.707545image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:11.018645image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:12.521187image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:03.416848image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:05.747460image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:07.105349image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:08.552323image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:09.893488image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:11.215467image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:12.699769image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:03.749922image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:05.943479image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:07.330684image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:08.752502image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:10.089718image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:11.408545image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:12.862364image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:04.613136image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:06.136689image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:07.528944image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:08.943981image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:10.270226image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:11.606807image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:13.039880image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:04.908552image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:06.335946image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:07.778555image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:09.139162image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:10.454887image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:11.795882image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:13.209697image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:05.114789image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:06.534762image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:07.966207image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:09.345817image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:10.633869image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:20:12.000645image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Missing values

2023-02-28T14:20:14.741583image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2023-02-28T14:20:16.331476image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-02-28T14:20:19.834262image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

record_idrepeat_instrument_1repeat_instrument_2repeat_instance_1repeat_instance_2data_da_primeira_consulta_institucional_dt_pci_1data_da_primeira_consulta_institucional_dt_pci_2data_do_diagnostico_1data_do_diagnostico_2codigo_da_topografia_cid_o_1codigo_da_topografia_cid_o_2codigo_da_morfologia_de_acordo_com_o_cid_o_1codigo_da_morfologia_de_acordo_com_o_cid_o_2estadio_clinico_1estadio_clinico_2grupo_de_estadio_clinico_1grupo_de_estadio_clinico_2classificacao_tnm_clinico_t_1classificacao_tnm_clinico_t_2classificacao_tnm_clinico_n_1classificacao_tnm_clinico_n_2classificacao_tnm_clinico_m_1classificacao_tnm_clinico_m_2metastase_ao_diagnostico_cid_o_1_1metastase_ao_diagnostico_cid_o_1_2metastase_ao_diagnostico_cid_o_2_1metastase_ao_diagnostico_cid_o_2_2metastase_ao_diagnostico_cid_o_3_1metastase_ao_diagnostico_cid_o_3_2metastase_ao_diagnostico_cid_o_4_1metastase_ao_diagnostico_cid_o_4_2data_do_tratamento_1data_do_tratamento_2combinacao_dos_tratamentos_realizados_no_hospital_1combinacao_dos_tratamentos_realizados_no_hospital_2ano_do_diagnostico_1ano_do_diagnostico_2lateralidade_do_tumor_1lateralidade_do_tumor_2data_de_recidiva_1data_de_recidiva_2tempo_desde_o_diagnostico_ate_a_primeira_recidiv_1tempo_desde_o_diagnostico_ate_a_primeira_recidiv_2local_de_recidiva_a_distancia_metastase_1_cid_o_topografia_1local_de_recidiva_a_distancia_metastase_1_cid_o_topografia_2local_de_recidiva_a_distancia_metastase_2_cid_o_topografia_1local_de_recidiva_a_distancia_metastase_2_cid_o_topografia_2local_de_recidiva_a_distancia_metastase_3_cid_o_topografia_1local_de_recidiva_a_distancia_metastase_3_cid_o_topografia_2local_de_recidiva_a_distancia_metastase_4_cid_o_topografia_1local_de_recidiva_a_distancia_metastase_4_cid_o_topografia_2descricao_da_morfologia_de_acordo_com_cid_o_1descricao_da_morfologia_de_acordo_com_cid_o_2descricao_da_topografia_1descricao_da_topografia_2classificacao_tnm_patologico_n_1classificacao_tnm_patologico_n_2classificacao_tnm_patologico_t_1classificacao_tnm_patologico_t_2com_recidiva_a_distancia_1com_recidiva_a_distancia_2com_recidiva_regional_1com_recidiva_regional_2com_recidiva_local_1com_recidiva_local_2
0302Registro De TumoresNaN1.0NaN2008-03-22NaN2008-03-23NaNC504NaN85003.0NaNIIANaNIINaN2NaN0NaN0NaNNaNNaNNaNNaNNaNNaNNaNNaN2008-08-15NaNCirurgia + Radio + Quimio + HormonioNaN2008.0NaNEsquerdaNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNCARCINOMA DUCTAL INFILTRANTE SOENaNMAMA QUADRANTE SUPERIOR EXTERNO DANaNNaNNaNNaNNaNNãoNaNNãoNaNNãoNaN
1710Registro De TumoresNaN1.0NaN2006-11-11NaN2007-11-11NaNC508NaN85003.0NaNIIIANaNIIINaN3NaN1NaN0NaNNaNNaNNaNNaNNaNNaNNaNNaN2008-05-29NaNCirurgia + QuimioterapiaNaN2008.0NaNEsquerdaNaN2014-07-19NaN2442.0NaNNaNNaNNaNNaNNaNNaNNaNNaNCARCINOMA DUCTAL INFILTRANTE SOENaNMAMA LESAO SOBREPOSTA DANaNNaNNaNNaNNaNNãoNaNSimNaNSimNaN
2752Registro De TumoresNaN1.0NaN2007-09-25NaN2007-12-18NaNC509NaN84803.0NaNIIANaNIINaN2NaN0NaN0NaNNaNNaNNaNNaNNaNNaNNaNNaN2008-04-07NaNOutras combinaçõesNaN2008.0NaNEsquerdaNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNADENOCARCINOMA MUCINOSONaNMAMA SOE (EXCLUI PELE DA MAMA C44.5)NaNX - nao foi possivel determinarNaN2NaNNãoNaNNãoNaNNãoNaN
31367Registro De TumoresNaN1.0NaN2008-02-03NaN2008-02-06NaNC505NaN85003.0NaNIIANaNIINaN1NaN1NaN0NaNNaNNaNNaNNaNNaNNaNNaNNaN2008-09-29NaNOutras combinaçõesNaN2008.0NaNEsquerdaNaN2010-07-15NaN890.0NaNC34 - Bronquios e PulmoesNaNC50 - MamaNaNNaNNaNNaNNaNCARCINOMA DUCTAL INFILTRANTE SOENaNMAMA QUADRANTE INFERIOR EXTERNO DANaN1NaN1ANaNNãoNaNSimNaNNãoNaN
41589Registro De TumoresNaN1.0NaN2008-05-15NaN2008-05-21NaNC508NaN85003.0NaNIIBNaNIINaN2NaN1NaN0NaNNaNNaNNaNNaNNaNNaNNaNNaN2008-09-16NaNCirurgia + Radio + QuimioNaN2008.0NaNDireitaNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNCARCINOMA DUCTAL INFILTRANTE SOENaNMAMA LESAO SOBREPOSTA DANaNNaNNaNNaNNaNNãoNaNNãoNaNNãoNaN
51705Registro De TumoresNaN1.0NaN2007-05-09NaN2007-05-10NaNC504NaN85003.0NaNIIANaNIINaN1NaN1NaN0NaNNaNNaNNaNNaNNaNNaNNaNNaN2007-12-06NaNCirurgia + RadioterapiaNaN2008.0NaNDireitaNaN2012-12-19NaN2050.0NaNC38 - Coração, Mediastino e Pleura,NaNC71 - EncefaloNaNC34 - Bronquios e PulmoesNaNC50 - MamaNaNCARCINOMA DUCTAL INFILTRANTE SOENaNMAMA QUADRANTE SUPERIOR EXTERNO DANaNNaNNaNNaNNaNNãoNaNSimNaNNãoNaN
61843Registro De TumoresNaN1.0NaN2008-12-07NaN2008-07-27NaNC509NaN85003.0NaNIIANaNIINaN1CNaN1NaN0NaNNaNNaNNaNNaNNaNNaNNaNNaN2009-01-25NaNQuimioterapiaNaN2008.0NaNnão se aplicaNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNCARCINOMA DUCTAL INFILTRANTE SOENaNMAMA SOE (EXCLUI PELE DA MAMA C44.5)NaN1NaN1CNaNNãoNaNNãoNaNNãoNaN
71873Registro De TumoresNaN1.0NaN2008-12-08NaN2008-08-30NaNC509NaN85003.0NaNIIBNaNIINaN2NaN1NaN0NaNNaNNaNNaNNaNNaNNaNNaNNaN2008-12-12NaNOutras combinaçõesNaN2008.0NaNEsquerdaNaN2016-02-29NaN2739.0NaNC71 - EncefaloNaNNaNNaNNaNNaNNaNNaNCARCINOMA DUCTAL INFILTRANTE SOENaNMAMA SOE (EXCLUI PELE DA MAMA C44.5)NaNNaNNaNNaNNaNSimNaNNãoNaNNãoNaN
81898Registro De TumoresNaN1.0NaN2008-08-23NaN2008-06-20NaNC509NaN85003.0NaNIVNaNIVNaN4NaN2NaN1NaNC41 - Ossos e Das Cartilagens Articulares de Outras LocalizaçõesNaNC22 - Fígado e Das Vias Biliares Intra-hepáticasNaNC77 - Secundária e Não Especificada Dos Gânglios LinfáticosNaNNaNNaN2008-10-21NaNQuimioterapiaNaN2008.0NaNEsquerdaNaN2009-08-14NaN420.0NaNC71 - EncefaloNaNNaNNaNNaNNaNNaNNaNCARCINOMA DUCTAL INFILTRANTE SOENaNMAMA SOE (EXCLUI PELE DA MAMA C44.5)NaNNaNNaNNaNNaNNãoNaNSimNaNNãoNaN
91960Registro De TumoresNaN1.0NaN2009-01-30NaN2008-07-28NaNC509NaN85003.0NaNIIIANaNIIINaN3NaN2NaN0NaNNaNNaNNaNNaNNaNNaNNaNNaN2009-01-30NaNOutras combinaçõesNaN2008.0NaNDireitaNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNCARCINOMA DUCTAL INFILTRANTE SOENaNMAMA SOE (EXCLUI PELE DA MAMA C44.5)NaNNaNNaNNaNNaNNãoNaNNãoNaNNãoNaN
record_idrepeat_instrument_1repeat_instrument_2repeat_instance_1repeat_instance_2data_da_primeira_consulta_institucional_dt_pci_1data_da_primeira_consulta_institucional_dt_pci_2data_do_diagnostico_1data_do_diagnostico_2codigo_da_topografia_cid_o_1codigo_da_topografia_cid_o_2codigo_da_morfologia_de_acordo_com_o_cid_o_1codigo_da_morfologia_de_acordo_com_o_cid_o_2estadio_clinico_1estadio_clinico_2grupo_de_estadio_clinico_1grupo_de_estadio_clinico_2classificacao_tnm_clinico_t_1classificacao_tnm_clinico_t_2classificacao_tnm_clinico_n_1classificacao_tnm_clinico_n_2classificacao_tnm_clinico_m_1classificacao_tnm_clinico_m_2metastase_ao_diagnostico_cid_o_1_1metastase_ao_diagnostico_cid_o_1_2metastase_ao_diagnostico_cid_o_2_1metastase_ao_diagnostico_cid_o_2_2metastase_ao_diagnostico_cid_o_3_1metastase_ao_diagnostico_cid_o_3_2metastase_ao_diagnostico_cid_o_4_1metastase_ao_diagnostico_cid_o_4_2data_do_tratamento_1data_do_tratamento_2combinacao_dos_tratamentos_realizados_no_hospital_1combinacao_dos_tratamentos_realizados_no_hospital_2ano_do_diagnostico_1ano_do_diagnostico_2lateralidade_do_tumor_1lateralidade_do_tumor_2data_de_recidiva_1data_de_recidiva_2tempo_desde_o_diagnostico_ate_a_primeira_recidiv_1tempo_desde_o_diagnostico_ate_a_primeira_recidiv_2local_de_recidiva_a_distancia_metastase_1_cid_o_topografia_1local_de_recidiva_a_distancia_metastase_1_cid_o_topografia_2local_de_recidiva_a_distancia_metastase_2_cid_o_topografia_1local_de_recidiva_a_distancia_metastase_2_cid_o_topografia_2local_de_recidiva_a_distancia_metastase_3_cid_o_topografia_1local_de_recidiva_a_distancia_metastase_3_cid_o_topografia_2local_de_recidiva_a_distancia_metastase_4_cid_o_topografia_1local_de_recidiva_a_distancia_metastase_4_cid_o_topografia_2descricao_da_morfologia_de_acordo_com_cid_o_1descricao_da_morfologia_de_acordo_com_cid_o_2descricao_da_topografia_1descricao_da_topografia_2classificacao_tnm_patologico_n_1classificacao_tnm_patologico_n_2classificacao_tnm_patologico_t_1classificacao_tnm_patologico_t_2com_recidiva_a_distancia_1com_recidiva_a_distancia_2com_recidiva_regional_1com_recidiva_regional_2com_recidiva_local_1com_recidiva_local_2
426282100Registro De TumoresNaN1.0NaN2020-07-24NaN2020-07-24NaNC504NaN85003.0NaNIIIBNaNNaNNaN4BNaN1NaN0NaNNaNNaNNaNNaNNaNNaNNaNNaN2020-10-07NaNCirurgia + Radio + QuimioNaN2020.0NaNEsquerdaNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNCARCINOMA DUCTAL INFILTRANTE SOENaNMAMA QUADRANTE SUPERIOR EXTERNO DANaNNaNNaNNaNNaNNãoNaNNãoNaNNãoNaN
426382111Registro De TumoresNaN1.0NaN2020-08-09NaN2020-06-27NaNC504NaN85002.0NaN0NaNNaNNaNISNaN0NaN0NaNNaNNaNNaNNaNNaNNaNNaNNaN2020-12-06NaNOutras combinaçõesNaN2020.0NaNDireitaNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNCARCINOMA INTRADUCTAL NAO INFILTRANTE SOENaNMAMA QUADRANTE SUPERIOR EXTERNO DANaNNaNNaNNaNNaNNãoNaNNãoNaNNãoNaN
426482112Registro De TumoresNaN1.0NaN2020-09-08NaN2020-09-29NaNC505NaN85003.0NaNIIIANaNNaNNaN3NaN1NaN0NaNNaNNaNNaNNaNNaNNaNNaNNaN2020-11-23NaNOutras combinaçõesNaN2020.0NaNDireitaNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNCARCINOMA DUCTAL INFILTRANTE SOENaNMAMA QUADRANTE INFERIOR EXTERNO DANaNNaNNaNNaNNaNNãoNaNNãoNaNNãoNaN
426582118Registro De TumoresNaN1.0NaN2020-01-28NaN2020-02-27NaNC509NaN85003.0NaNIANaNNaNNaN1CNaN0NaN0NaNNaNNaNNaNNaNNaNNaNNaNNaN2020-05-26NaNOutras combinaçõesNaN2020.0NaNDireitaNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNCARCINOMA DUCTAL INFILTRANTE SOENaNMAMA SOE (EXCLUI PELE DA MAMA C44.5)NaNNaNNaNNaNNaNNãoNaNNãoNaNNãoNaN
426682122Registro De TumoresNaN1.0NaN2020-11-04NaN2020-07-06NaNC509NaN85003.0NaNIIANaNNaNNaN2NaN0NaN0NaNNaNNaNNaNNaNNaNNaNNaNNaN2020-12-03NaNOutras combinaçõesNaN2020.0NaNEsquerdaNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNCARCINOMA DUCTAL INFILTRANTE SOENaNMAMA SOE (EXCLUI PELE DA MAMA C44.5)NaNNaNNaNNaNNaNNãoNaNNãoNaNNãoNaN
426782123Registro De TumoresRegistro De Tumores1.02.02020-12-042020-12-042020-10-102020-10-10C504C50985003.085003.0IIBIIANaNNaN320000NaNNaNNaNNaNNaNNaNNaNNaN2020-12-142020-12-14Outras combinaçõesOutras combinações2020.02020.0DireitaEsquerdaNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNCARCINOMA DUCTAL INFILTRANTE SOECARCINOMA DUCTAL INFILTRANTE SOEMAMA QUADRANTE SUPERIOR EXTERNO DAMAMA SOE (EXCLUI PELE DA MAMA C44.5)NaNNaNNaNNaNNãoNãoNãoNãoNãoNão
426882124Registro De TumoresRegistro De Tumores1.02.02020-06-202020-06-202020-09-052020-09-05C509C50985203.085002.0IV0NaNNaN4DCDIS1010C38 - Coração, Mediastino e Pleura,NaNC22 - Fígado e Das Vias Biliares Intra-hepáticasNaNC34 - Bronquios e PulmoesNaNC41 - Ossos e Das Cartilagens Articulares de Outras LocalizaçõesNaN2021-01-122021-01-12QuimioterapiaQuimioterapia2020.02020.0EsquerdaDireitaNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNCARCINOMA LOBULAR SOECARCINOMA INTRADUCTAL NAO INFILTRANTE SOEMAMA SOE (EXCLUI PELE DA MAMA C44.5)MAMA SOE (EXCLUI PELE DA MAMA C44.5)NaNNaNNaNNaNNãoNãoNãoNãoNãoNão
426982131Registro De TumoresNaN1.0NaN2020-11-01NaN2019-12-23NaNC502NaN85203.0NaNIIIANaNNaNNaN3NaN1NaN0NaNNaNNaNNaNNaNNaNNaNNaNNaN2020-12-23NaNCirurgia + RadioterapiaNaN2020.0NaNDireitaNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNCARCINOMA LOBULAR SOENaNMAMA QUADRANTE SUPERIOR INTERNO DANaNNaNNaNNaNNaNNãoNaNNãoNaNNãoNaN
427082205Registro De TumoresNaN1.0NaN2021-02-28NaN2020-11-07NaNC504NaN85003.0NaNIVNaNNaNNaN4DNaN1NaN1NaNC71 - EncefaloNaNC77 - Secundária e Não Especificada Dos Gânglios LinfáticosNaNC49 - Tecido Conjuntivo e de Outros Tecidos MolesNaNNaNNaN2021-03-27NaNCirurgia + Radio + QuimioNaN2020.0NaNEsquerdaNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNCARCINOMA DUCTAL INFILTRANTE SOENaNMAMA QUADRANTE SUPERIOR EXTERNO DANaNNaNNaNNaNNaNNãoNaNNãoNaNSimNaN
427182240Registro De TumoresNaN1.0NaN2020-12-08NaN2020-03-14NaNC505NaN85003.0NaNIIICNaNNaNNaN2NaN3ANaN0NaNNaNNaNNaNNaNNaNNaNNaNNaN2021-01-12NaNOutras combinaçõesNaN2020.0NaNDireitaNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNCARCINOMA DUCTAL INFILTRANTE SOENaNMAMA QUADRANTE INFERIOR EXTERNO DANaNNaNNaNNaNNaNNãoNaNNãoNaNNãoNaN